Issue with /tmp files being auto created and stored on server

Hello, sorry for the long post, but I wanted to be thorough. We are having a BIG problem with our server and I am praying you guys can help. So, where to start...I guess the beginning...

About 6-8 weeks ago our server started crashing once a day or so. Whenever the crash occurred, if you visited one of the sites we manage, the spinning browser wheel would never stop spinning and the site would never load. No error messages ever came up. To temporarily fix it, I would go in and click Reboot Server from the Server Admin area of our Cpanel. This would bring the sites back online for about a day until it crashed again.

These are the things I tried doing the first week:

1) Delete all unnecessary themes and plugins not being used or were deactivated.

2) Delete all unused database installations, repair and optimize all MySQL databases

That did not solve the problem. Our server continued to crash daily. Strange enough, we then started receiving an error message from the server daily:

The chkservd sub-process with pid 5764 ran for 602 seconds. This sub-process was terminated when it exceeded the time allowed between checks, which is 300 seconds. To determine why, you can check /var/log/chkservd.log and /usr/local/cpanel/logs/tailwatchd_log.

You likely received this notification as a symptom of a larger problem. If your server is experiencing a high load, we recommend investigating the cause. If you continue to receive this notification, it is likely that your system is unable to handle demand or a misconfiguration is delaying restarts.

If you are sure that no misconfigurations exist, you should consider gradually increasing the following options in WHM's "Tweak Settings" feature: "The number of times ChkServd will allow a previous check to complete before terminating the check" and/or "The number of seconds between ChkServd service checks".
Server: server.e3corporate.com
Primary IP: 192.163.227.125
Service: chkservd
Notification Type: hang
Memory Information:

Used: 3482MB
Available: 338MB
Installed: 3830MB

Load Information: 387.87 325.66 182.21
Uptime: 0 days, 22 hours, 2 seconds
IOStat Information:

avg-cpu: %user %nice %system %iowait %steal %idle
14.97 1.35 4.59 2.43 0.00 76.66
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
vda 26.53 1726.92 2540.41 136988074 201518008
vdb 6.36 139.77 179.82 11086872 14264584

Frustrated, I finally hired W3 Edge to professionally install and configure their W3 Total Cache plugin on our three resource heavy sites; e3ngage.com, suspicious0bservers.org, and jumpwithjill.com.

They completed these optimizations about a week ago.

The server crashes continued. BlueHost then said it might be because one of our sites (suspicious0bservers.org) was under a DDOS BruteForce attack which was chewing up server resources. So we then installed CloudProxy Firewall from Sucuri Security on that site. This did not entirely solve the problem either. I then talked to BlueHost for a long time and they said our /tmp and /var/tmp/ folders on the server were being filled with files, which was eating up memory. They suggested hiring Sucuri Security to do our website backups off of our own server for storage and to clear out all the backups that were there on the server already and set the backup options for the server to never take a backup - all of which we did. They also cleared out these folders as well.

That SORT OF worked. The server has not crashed in two days, however, about the same time yesterday and today we get this alert email from BlueHost:

The file system /usr/tmpDSK, which is mounted at /tmp, has reached critical status because it is 99% full.
Server: server.e3corporate.com
Primary IP: 192.163.227.125
Notification Type: diskcritical
Filesystem: /usr/tmpDSK
Mount Point: /tmp
Percentage Full: 99%
Disk Information:

Used: 3.71GB
Available: 0.03GB
Total: 3.94GB

ChkServd Version: 15.2

And this alert email:

The file system /tmp, which is mounted at /var/tmp, has reached critical status because it is 99% full.
Server: server.e3corporate.com
Primary IP: 192.163.227.125
Notification Type: diskcritical
Filesystem: /tmp
Mount Point: /var/tmp
Percentage Full: 99%
Disk Information:

Used: 3.71GB
Available: 0.03GB
Total: 3.94GB

ChkServd Version: 15.2

After speaking with BlueHost for over two hours they said they are stumped. They said they cannot see where these files are being auto generated from. They also said if we dont fix this issue soon we could have an unrecoverable server crash. They did say the saw a number of files in these folders with "APC" and "sessions" extensions, but that is all they could tell. They said it COULD be a bad plugin somewhere (although I am good at only using well respected plugins like WPMU, Sucuri, etc.). They also said it COULD just be a bit of bad code written somewhere but short of a full server reset (which is not an option) they couldn't really provide any more help. Oh and the also said they've "never seen a folder like /usr/tmpDSK and they are not sure what it is doing or why it is there".

Anyways I'm about to throw up the white flag. Do you think there is a bad plugin or bit of code somewhere? Do we simply need to buy a bigger server?? I doubt that because we are a small operation right?? Is there something I need to install or configure differently or get rid of or upgrade AHHH I have been trying to figure this out for two months now. I wake up every morning afraid Ill look at my phone to dozens of angry client texts because their sites were down all night. Sighh...ANY advice or suggestions would be IMMENSELY appreciated, I'm going to lose my mind!

Thanks very much and again sorry with the long email!

Best,

AD

  • Vaughan

    hiya

    tmpDSK is just a mounted virtual temporary disk hence tmpDSK.

    i think it's common when using WHM &VPS servers.

    however that doesn't explain why it keeps filling up, would there be any chance of seeing the filenames of any files in there?

    if it's mainly session data, i don't know.

    check on your servers php.ini and make sure session.use_only_cookies is enabled.

    you could also try changing the garbage collection try changing session.gc_maxlifetime

    sometimes also if you have changed session.save_path to a different folder off root, the automatic garbage collector will not work so files will not be deleted which will soon fill up, so you need to create a cron to do the gc instead.

    it might be nothing to do with that though.

    if you can disable a few plugins for a few days and keep monitoring, if it still fills up, you know its not those plugins.

    hope this helps.

  • Adrian

    Hey thanks very much for the help. I went through every folder and directory using FileZilla and could not find the usr/tmpDSK folder! So i dont know where that one is but I went through the /tmp folder (which is also getting that 99% full error daily) and looked through all the folders. Honestly, I'm not exactly sure, but there are tons of .png files that look like "usage reports". There is also like a 2gb "user qquota" file... I also see:

    - installedapps.yml
    - a bunch of log files from June and July 2013 in the "slowmysql" folder

    Would it be OK to send over my FTP info and have you take a look? Im not super familiar with what files are OK to be in here, it would be great to have a second opinion?

    Thanks and regards,

    AD

Thank NAME, for their help.

Let NAME know exactly why they deserved these points.

Gift a custom amount of points.