#archiveteam 2012-06-04,Mon

↑back Search

Time Nickname Message
00:06 🔗 S[h]O[r]T not me but i have ipv6 connectivity if needed
01:15 🔗 Zebranky SketchCow: 200k. WTF. This community is ridiculous.
02:26 🔗 godane i found some xplay videos
02:26 🔗 godane from 2011 though
02:55 🔗 shaqfu SketchCow around, or is he still knee-deep in doc shooting?
02:59 🔗 shaqfu (And is he still looking for arms for the CHM project?)
03:08 🔗 Coderjoe most likely in the air or sleeping
03:11 🔗 _fox running some warriors
03:11 🔗 _fox this is p neat
03:19 🔗 _fox http://diybookscanner.myshopify.com/products/diy-book-scanner-kit
03:23 🔗 DrainLbry $595 seems hella steep for that
03:23 🔗 DrainLbry erm, bicycle level triggers though ok yeah this is a bit fancier then i thought initially
03:23 🔗 DrainLbry god knows i couldnt build it
07:18 🔗 Aranje so do I want to even ask about http://blog.picplz.com/day/2012/06/01/
07:24 🔗 SketchCow Not without glancing at http://picplz.heroku.com/
07:28 🔗 Aranje oh wonderful
08:07 🔗 ersi Aranje: #piczzz
08:08 🔗 Aranje ersi, thanks!
08:08 🔗 ersi and http://archiveteam.org/index.php?title=Picplz of course :)
08:08 🔗 ersi no prob
08:08 🔗 Aranje yep yep, running it already
08:09 🔗 Aranje trying to fish out exactly what the requirements are for compiling wget warc so I can make lists of things people need installed on various linuxes (and in my particular case, freebsd)
08:09 🔗 Aranje pending tomorrow though, I want to sleep
08:11 🔗 RedType id love to calculate the size of the internet archive per year
08:11 🔗 RedType so every active page that was up on january 1st or w/e for each year
08:12 🔗 ersi Aranje: => lua 5.1 basically
08:12 🔗 Aranje so gnutls-dev, lua (headers, or just runtime?) build-essential
08:13 🔗 ersi on debian/ubuntu I needed liblua-5.1-dev
08:14 🔗 Aranje bah freebsd won't let me install gnutls-dev because theirs is vulnerable to 2 different security issues
08:14 🔗 Aranje awesome
08:14 🔗 chronomex I use debian, had to apt-get install liblua5.1.0-dev
08:14 🔗 chronomex ha
08:14 🔗 Aranje portaudit is cockblocking me
08:14 🔗 chronomex er
08:14 🔗 Aranje (like a good app)
08:14 🔗 alard OpenSSL should work too.
08:14 🔗 chronomex apt-get install liblua5.1.0-dev lua5.1
08:26 🔗 Aranje oof
08:26 🔗 Aranje first line of the script is a doozy
08:26 🔗 Aranje /bin/bash doesn't exist (bash exists, but it doesn't reside there!)
08:27 🔗 ersi yeah, how about not using a bsd machine *trolls*
08:27 🔗 Aranje Hey mang, it's what I got
08:27 🔗 Aranje already running the debian machine
08:27 🔗 Aranje aka easymode
08:28 🔗 Aranje on the plus side, freebsd needs a grand total of two things
08:28 🔗 Aranje (assuming the compile works, that is)
08:28 🔗 Aranje or not. it failed to find a lua, probably because there is no development lua port
12:35 🔗 Cameron_D Ooh, I jsut found more Splinder data. Need to run the script over that
13:15 🔗 Coderjoe Aranje: there may be other things needed on bsd. I'm pretty sure that the script expects GNU userspace tools that it may use
13:17 🔗 Coderjoe mmm
13:18 🔗 Coderjoe pretty much all of the lifting is done in lua now. nice.
13:59 🔗 jjonas here is 1000 more usernames food for the me.com crawlers http://pastebin.com/ZDBY8pf1
13:59 🔗 jjonas who has access to memac.heroku.com?
14:00 🔗 ersi are you the famous jonas?
14:01 🔗 ersi nevermind
14:02 🔗 jjonas mh ?:D
14:11 🔗 Coderjoe alard: ^
14:22 🔗 alard jjonas: Thanks. I pasted them in here: http://memac.heroku.com/rescue-me (in batches)
14:25 🔗 Schbirid gamespy/IGN so reminds me of yahoo
14:58 🔗 Schbirid today's ovh beta server giveaway is at 21:00 UTC
14:59 🔗 jjonas yay
15:00 🔗 jjonas ;)
15:00 🔗 Schbirid as before, you need to follow their twitter account https://twitter.com/#!/OVH
15:01 🔗 Schbirid cant be arsed to find the urls, pick any ovh site. servers -> us flag, then you should see the signup form
15:01 🔗 Schbirid iirc i hammered it starting 2 minutes before the announced time and got through
15:19 🔗 SketchCow Hey, so thing.
15:19 🔗 SketchCow Tomorrow, SF is doing a big fiber outage
15:19 🔗 SketchCow Expected 8-12 hour outage.
15:20 🔗 SketchCow Archive.org is working to see about keeping itself up.
15:20 🔗 SketchCow But it might not happen.
15:26 🔗 Coderjoe ouch
15:26 🔗 SmileyG O
15:26 🔗 SmileyG :
15:27 🔗 alard Well, if they can keep themselves up, I'm sure we can arrange they'll get some traffic.
15:27 🔗 SmileyG wtf why can't I type anymore :o
15:27 🔗 Coderjoe any public details as to why there is going to be a big fiber outage?
15:27 🔗 SmileyG not got redundant linking?
15:29 🔗 Coderjoe SmileyG: if it affects multiple providers, redundant links are moot
15:29 🔗 SmileyG ....
15:29 🔗 SmileyG then its not redundant :)
15:29 🔗 SmileyG I mean if the whole AREA is offline yeah sure your screwed.
15:30 🔗 SmileyG But any provider having outages.... that shouldn't effect others.
15:30 🔗 SmileyG affect? :/
15:32 🔗 Schbirid i thought archive.org had redundant hosting
15:33 🔗 Coderjoe shouldn't affect / shouldn't have an effect
15:33 🔗 SketchCow It's an SF thing
15:34 🔗 Schbirid http://www.worldofmule.net/tiki-index.php?page=IBM%20PC
15:34 🔗 SketchCow All I know is, this was a big discussion at the lunch, I wanted you guys to know.
15:35 🔗 SketchCow I was in on that M.U.L.E. thing
15:35 🔗 Schbirid yeah, your name is in some of the writeups
15:35 🔗 Schbirid :)
15:36 🔗 Coderjoe i can't seem to find anything about it, but searching for anything these days sucks
15:39 🔗 yipdw interestingly, "sf internet" on search.twitter.com turns up results for "Eduard Khil has died"
16:02 🔗 SketchCow http://nedbatchelder.com/blog/201201/goodbye_tabblo.html
16:15 🔗 SketchCow Schedule today: Doing a round of cleanup post-trip, sending some e-mails, then down to NYC for a memorial service.
16:31 🔗 Aranje huh, that's odd timing for the fiber outage tomorrow. It's primaries day here in california.
16:43 🔗 DFJustin http://www.cbc.ca/news/canada/story/2012/06/03/pol-campaign-to-oppose-budget-bill.html
16:46 🔗 balrog_ SketchCow: w.r.t your comment on IUMA pictures, for some reason the thumbnail on the wayback link doesn't work but clicking on it does
16:47 🔗 balrog_ but this isn't that consistent
16:48 🔗 joepie91 I has a sad: http://rt.com/art-and-culture/news/trololo-dead-stroke-stpetersburg-898/
16:51 🔗 SketchCow It's not consistent, but I think it can be done.
16:51 🔗 SketchCow I want to wait, rope around, fix it up.
17:22 🔗 Schbirid is wget buggy and forgetting to grab files when mirroring?
17:22 🔗 Schbirid i grabbed a site and random images are missing (other images from the same dir were downloaded fine)
17:26 🔗 SketchCow closure inches slowly toward $20k
17:33 🔗 _fox that's cool
17:35 🔗 SketchCow I need help unpacking a .warc file
17:36 🔗 underscor SketchCow: hanzo's warcextract should do what you need
17:36 🔗 underscor Do you have the warc publically available
17:37 🔗 SketchCow http://fos.textfiles.com/borscht/
17:47 🔗 underscor grr, netsplut
17:53 🔗 underscor there we go, no more netspluts
17:54 🔗 underscor SketchCow: I was wrong, warcextract only prints a human readable summary. wondering if using alard's warc-proxy and then running wget against it is the best solution...
17:55 🔗 alard What might be interesting is this: modify warcextract so that it doesn't just print the results, but builds a zip file.
17:55 🔗 SketchCow Find a solution, top priority.
17:55 🔗 SketchCow I'm now getting baragged with angry, unhappy, sad tabblo ex-users
17:55 🔗 SketchCow Tabblo apparently, based on what I'm getting, did a VERY poor and possibly no job informing people of the shutdown.
18:02 🔗 Schbirid httrack sucks
18:03 🔗 ersi It does, indeed, suck.
18:04 🔗 SketchCow http://archive.org/~edward/search.php
18:04 🔗 SketchCow Faster search
18:04 🔗 underscor also more powerful
18:08 🔗 alard SketchCow: Aren't the zip files interesting to these Tabblo people?
18:08 🔗 SketchCow Give me the search link
18:09 🔗 alard http://archive.org/download/test-memac-index-test/tabblo.html
18:09 🔗 SketchCow Thank you
18:09 🔗 alard (It's a temporary object, will disappear in 30 days.)
18:10 🔗 underscor usually takes a bit longer than 30, actually
18:10 🔗 alard They promised 30. :)
18:10 🔗 alard and if we can't trust the Internet Archive...
18:11 🔗 underscor haha
18:11 🔗 alard Meanwhile, I'm halfway warctozip.py
18:11 🔗 underscor actually, pretty sure there's an automatic task for it now
18:11 🔗 SketchCow So actually, need a little hint here
18:11 🔗 SketchCow It looks like the zip files have all the photos.
18:11 🔗 SketchCow But the .warc files do not
18:11 🔗 SketchCow Is that right?
18:13 🔗 alard The .warc.gz files do not contain the original photos, no.
18:13 🔗 alard The .warc.gz files are what you would see if you went to the web page.
18:13 🔗 SketchCow Ah.
18:14 🔗 SketchCow Wow, so JUST the .warc.gz files are 450gb?
18:14 🔗 alard The zip files were a terabyte, I think.
18:14 🔗 SketchCow mommy
18:15 🔗 alard How does one remove "http://" from a string in python?
18:15 🔗 underscor ?
18:15 🔗 underscor does the normal re module not work
18:16 🔗 underscor (note I am python baby, I just have used it before)
18:16 🔗 alard I had hoped you'd just taptap the answer. A bit lazy. :)
18:16 🔗 alard re.sub
18:16 🔗 ersi I'd just substring it out
18:16 🔗 ersi but I suck :D
18:17 🔗 underscor oh, sorry
18:17 🔗 underscor I don't know the syntax off the top of my head D:
18:17 🔗 joepie91 hacky way would be to split in 3 parts and take the last
18:17 🔗 underscor where: &w_identifier=archiveteam-tabblo* | size: 1,210,601,813 KB
18:17 🔗 ersi print url[7:]
18:17 🔗 underscor SketchCow: ^
18:17 🔗 joepie91 something.split('/', 3)[2]
18:18 🔗 joepie91 ersi: that wouldn't deal with https though
18:18 🔗 ersi joepie91: yeah, it wouldn't :)
18:18 🔗 joepie91 (note: my split syntax may be shoddy, not that experienced in python yet)
18:18 🔗 joepie91 (but it should theoretically work)
18:18 🔗 underscor joepie91: what if you have "http://test.com/a/file/here.txt"
18:18 🔗 joepie91 underscor: the 3 indicates max 3 parts
18:18 🔗 underscor wouldn't that only return test.com?
18:18 🔗 underscor doh
18:18 🔗 joepie91 so after the first two parts it throws the rest into the 3rd element
18:18 🔗 joepie91 anyhow, urllib module probably has a nice way
18:19 🔗 joepie91 or urllib2
18:19 🔗 underscor urllib40000
18:19 🔗 joepie91 lol
18:19 🔗 joepie91 seriously though, I'm particularly good at dirty hacks, but I very much doubt whether that's a good skill...
18:19 🔗 joepie91 :|
18:19 🔗 * joepie91 blames his PHP background
18:20 🔗 underscor I'm trying to figure out if there's a lot of packet loss between me and ia, or between the box I'm sshing through and the target
18:20 🔗 underscor >:I
18:20 🔗 underscor it's atrocious
18:20 🔗 underscor and of course traceroute looks normal
18:20 🔗 SketchCow http://archive.org/search_beta/ is the official beta search.
18:20 🔗 yipdw alard: you probably already got this, but re.sub(r'^http://', '', str)
18:21 🔗 joepie91 underscor: you -> box with ssh -> ia ?
18:22 🔗 underscor me->ia->another ia box
18:22 🔗 joepie91 ok
18:22 🔗 joepie91 SSH into 'ia' and ping 'another ia box'
18:22 🔗 joepie91 if no packet loss, issue is between you and ia
18:22 🔗 joepie91 :P
18:22 🔗 alard https://github.com/alard/warctozip
18:23 🔗 underscor Reply from 207.241.224.4: bytes=32 time=175ms TTL=250
18:23 🔗 underscor Reply from 207.241.224.4: bytes=32 time=179ms TTL=250
18:23 🔗 underscor Reply from 207.241.224.4: bytes=32 time=192ms TTL=250
18:23 🔗 underscor Reply from 207.241.224.4: bytes=32 time=227ms TTL=250
18:23 🔗 alard ./warctozip.py somefile.warc.gz zipfile.zip
18:23 🔗 underscor gross
18:23 🔗 Schbirid underscor: mtr is your friend
18:23 🔗 underscor Reply from 74.125.228.9: bytes=32 time=34ms TTL=252
18:23 🔗 underscor (google.com)
18:24 🔗 underscor hmm, wonder why my latency to ia is so high
18:24 🔗 joepie91 hm, sec
18:25 🔗 joepie91 where's the server you are pinging located physically?
18:25 🔗 joepie91 country or state
18:25 🔗 SmileyG erm
18:25 🔗 underscor that's probably not a fair test
18:25 🔗 SmileyG maybe the work started early?
18:25 🔗 underscor since google is probably down the street
18:25 🔗 underscor and IA is across the country
18:26 🔗 joepie91 Chicago: 4 packets transmitted, 4 received, 0% packet loss, time 3000ms
18:26 🔗 joepie91 rtt min/avg/max/mdev = 57.259/61.906/67.758/3.837 ms
18:26 🔗 joepie91 Atlanta: 4 packets transmitted, 4 received, 0% packet loss, time 2998ms
18:26 🔗 joepie91 rtt min/avg/max/mdev = 62.096/62.531/62.778/0.407 ms
18:26 🔗 SmileyG 64 bytes from 207.241.224.4: icmp_req=1 ttl=51 time=179 ms
18:26 🔗 SmileyG PING 207.241.224.4 (207.241.224.4) 56(84) bytes of data.
18:26 🔗 SmileyG 64 bytes from 207.241.224.4: icmp_req=2 ttl=51 time=185 ms
18:26 🔗 SmileyG --- 207.241.224.4 ping statistics ---
18:26 🔗 SmileyG 64 bytes from 207.241.224.4: icmp_req=3 ttl=51 time=186 ms
18:26 🔗 SmileyG 3 packets transmitted, 3 received, 0% packet loss, time 2002ms
18:26 🔗 SmileyG rtt min/avg/max/mdev = 179.598/183.757/186.218/2.957 ms
18:26 🔗 SmileyG thats from the UK.
18:26 🔗 SketchCow OKAY STOP DOING THIS
18:26 🔗 SketchCow OKAY. STOP. DOING. THIS.
18:27 🔗 joepie91 :)
18:27 🔗 joepie91 Phoenix: rtt min/avg/max/mdev = 30.298/30.862/31.170/0.356 ms
18:27 🔗 * SmileyG has stopped
18:27 🔗 joepie91 seems there's nothing wrong with that server
18:27 🔗 SketchCow Oh my god, so aspy
18:27 🔗 SmileyG SketchCow: :D
18:27 🔗 SmileyG Was I right, is it due to the work>?
18:27 🔗 SketchCow DOES IT MATTER?
18:27 🔗 SmileyG I dunno?
18:27 🔗 SmileyG btw why caps?
18:27 🔗 SketchCow I guarantee you 5 fulltime people RIGHT NOW are working VERY HARD on network, if there's any.
18:28 🔗 SmileyG Cool, Any plans for a UK based DC? ;)
18:28 🔗 underscor ...hahahaha
18:28 🔗 SketchCow Because aspy posting of network program output deserves caps, chloroform and a dump in the river
18:28 🔗 SmileyG dude so much hate :(
18:28 🔗 SketchCow You don't even KNOW hate
18:28 🔗 SmileyG i didn't paste anything btw, exec -o ftw
18:28 🔗 SketchCow If you knew hate, I'd be in your room right now
18:28 🔗 SmileyG Hm.
18:29 🔗 SmileyG this has gone wildly off topic, shall we go to -bs?
18:29 🔗 underscor watch out, SmileyG. you're playing with fire
18:29 🔗 joepie91 -bs?
18:29 🔗 underscor and SketchCowis a canister of condensed propane
18:29 🔗 SketchCow archiveteam-bs
18:29 🔗 underscor SketchCow is*
18:29 🔗 SmileyG underscor: he seems sane enough to me.
18:29 🔗 joepie91 offtopic channel?
18:29 🔗 SketchCow Yes
18:29 🔗 underscor yes
18:29 🔗 joepie91 wtf I'm already there
18:29 🔗 underscor see /topic
18:29 🔗 * joepie91 stares at self
18:29 🔗 SketchCow it's so offtopic, you're already there.
18:29 🔗 SketchCow Archive Team: You're already there
18:29 🔗 underscor lol
18:29 🔗 joepie91 lol
18:30 🔗 SketchCow Archive Team: The Downloading Is Coming From Inside Your Servers
18:30 🔗 SketchCow http://ia601202.us.archive.org/3/items/test-memac-index-test/tabblo.html has saved my fucking bacon.
18:30 🔗 SketchCow Because OH MY GOD did Hp really fuck up the Tabblo thing.
18:31 🔗 yipdw seems to be an HP motif
18:31 🔗 underscor ^
18:31 🔗 SketchCow It appears in the rush to shut it down, they really didn't do a good job of mailing out notifies.
18:31 🔗 underscor SketchCow: are people twittering/emailing you about it?
18:31 🔗 SmileyG Fucking your shit up, so you don't have it?
18:34 🔗 alard SketchCow: Before you start linking that url, should we make a permanent one?
18:34 🔗 underscor he should just be able to pop it out of the test-items collection
18:34 🔗 underscor (which will remove the auto-purge)
18:35 🔗 alard The name isn't really good.
18:35 🔗 joepie91 SketchCow: I suspect someone at HP noticed 'oh fuck, those servers are still running.. quick, shut them down before the boss notices!'
18:36 🔗 underscor he can rename too
18:36 🔗 SketchCow I am sure it's actually because HP is doing a round of cost-cutting and layoffs for a different reason.
18:39 🔗 ersi Axing it axing it axing it - cause my CEO told me soo
18:39 🔗 SketchCow I have to go now - driving to NYC to take people to a memorial service
18:39 🔗 SketchCow I didn't know her, but they're very broken up about it, so it'll be a tough night.
18:39 🔗 SketchCow When I get back, I'd like to work on some stuff, it'll be late.
18:40 🔗 ersi Good luck with the driving and attending memorial service
18:40 🔗 SketchCow This tabblo search thing, jesus that saves a life
18:40 🔗 ersi Freggin' great work
18:49 🔗 Schbirid 10 minutes until ovh beta server giveaway
18:56 🔗 joepie91 :o
18:56 🔗 Schbirid 2 minutes
18:57 🔗 underscor link?
18:57 🔗 Schbirid aka start now
18:57 🔗 Schbirid http://www.ovh.com/fr/serveurs_dedies/commande_usa_beta.xml
18:57 🔗 Schbirid you need to follow them already
18:57 🔗 underscor I do
18:57 🔗 Schbirid ignore the message about the servers being gone
18:58 🔗 Schbirid keeping trying
18:58 🔗 joepie91 would anyone have a use for a parser that parses a load of .eml files, and generates an attachment directory + sqlite database of all of them, plus can optionally render the entire database into a bunch of static HTML including sorting by several fields?
18:58 🔗 shaqfu Weren't people having serious issues with OVH earlier?
18:58 🔗 underscor what is Désolé, nous avons atteint la limite des serveurs disponibles aujourd'hui. Revenez demain?
18:58 🔗 Schbirid lol
18:58 🔗 Schbirid i suck
18:58 🔗 Schbirid utc 2100 is 2 hours away
18:58 🔗 Schbirid sorry
18:58 🔗 underscor damn
18:58 🔗 underscor hahaha
18:58 🔗 underscor np
18:58 🔗 * joepie91 feels like a captcha monkey
18:58 🔗 Schbirid until then, go find the link on ovh.co.uk or so ;)
18:58 🔗 shaqfu underscor: They're out of servers
18:59 🔗 underscor ah
18:59 🔗 shaqfu Holy fuck I spent literally two hours studying French and read that; +1 baller points to me
18:59 🔗 shaqfu Anyway, weren't people complaining about OVH last giveaway?
19:03 🔗 SketchCow Thank you, thank you, thank you. You have saved my memories and all my hard work of putting them together. Please thank everyone on your team from the bottom of my heart. You understand something that HP can never understand, the most important thing in life are the people in it. We are not just users or customers adding to your bottom line, but we are people. We are so sad to see our memories just wiped away with a switch.
19:03 🔗 SketchCow Thank you so much for doing this us.
19:03 🔗 underscor Wow. That's really touching
19:03 🔗 underscor It's things like that that make all the effort worth it
19:04 🔗 SmileyG yah
19:06 🔗 Schbirid i dont get why companies dont just make such sites static
19:07 🔗 shaqfu $
19:10 🔗 underscor SketchCow: you should xpost that to collections :)
19:11 🔗 underscor picplz estimate 2.5TB
19:16 🔗 primus_ awesome job on virtual machine, thanks, that makes it really easy to contribute
19:20 🔗 DFJustin I don't think the test collection cleanup is very aggressive, I have stuff that's been in there since february
19:21 🔗 joepie91 okay, what the hell just happened
19:21 🔗 joepie91 my router completely shit itself :|
19:22 🔗 joepie91 what I said and probably didn't arrive: <joepie91>meh, anyhow, if anyone wants said email parser: git clone http://git.cryto.net/repo/projects/joepie91/emailparser/
19:23 🔗 yipdw re: Tabblo: http://thenextweb.com/insider/2012/06/03/startups-should-bend-over-backwards-to-let-users-take-their-data-after-they-shut-down/
19:23 🔗 yipdw (and Picplz)
19:24 🔗 yipdw the comment by "Mike Post" is really weirf
19:24 🔗 yipdw weird
19:31 🔗 LordNlptp SketchCow: was there anything not already archived in the geocities stuff i sent?
19:34 🔗 joepie91 yipdw: sounds like the average 'you have to pay for the air you breathe' guy...
19:37 🔗 joepie91 anyhow, are programmers currently needed to write crawlers or anything?
19:40 🔗 yipdw joepie91: for picplz? no
19:40 🔗 yipdw joepie91: #piczzz
19:40 🔗 joepie91 just in general :P
19:41 🔗 * joepie91 enjoys crawling, parsing, etc
19:41 🔗 yipdw oh
19:42 🔗 yipdw in general, yes
19:42 🔗 yipdw please do check out the *-grab repositories on ArchiveTeam's github for AT conventions
19:42 🔗 yipdw re: file format, reporting, etc
19:45 🔗 joepie91 alright, will have a read soon
19:45 🔗 joepie91 is there any specific document that lists the whole thing, by any chance? or best to just try to derive it from the repos?
19:46 🔗 yipdw none yet
19:46 🔗 yipdw I guess documenting archival standards would be a good th ing
19:47 🔗 ersi what? archival standards?
19:47 🔗 yipdw best practices for ArchiveTeam projects, yes
19:48 🔗 yipdw as far as I know, we have no such document
19:48 🔗 ersi do we even have best practises?
19:48 🔗 ersi that's news to me :)
19:49 🔗 yipdw if you look at the *-grab repositories, there are patterns and conventions
19:49 🔗 yipdw and I think usage of WARC is a best practice
19:49 🔗 ersi well, besides A) doing is more important than thinking B) BE ON A SANE FILESYSTEM, YOU FUCK C) TIMESTAMPS AND SHIT!
19:49 🔗 ersi yeah, WARC all the way baby
19:49 🔗 yipdw the point is to codify them so that people who want to write software to do crawling have a place to start
19:50 🔗 yipdw also, that assists with point (A)
19:50 🔗 ersi ye
19:50 🔗 ersi True enough, I guess
19:51 🔗 yipdw I don't know how important (B) is at this point
19:51 🔗 yipdw it *was* important before we had a standard format
19:51 🔗 yipdw rephrased -- it was important when there was no standard format
19:52 🔗 yipdw I don't know if it still is
19:52 🔗 yipdw but, yes, the idea is to hash all that sort of stuff out
19:52 🔗 ersi It was *very* important before we started doing WARC
19:52 🔗 yipdw that's what I wrote
19:52 🔗 ersi as well as C) - but WARC takes care of that as well
19:53 🔗 ersi let me fucking rephrase it then
19:53 🔗 ersi "It's not as important now, because of exactly what you pointed out a few lines up"
19:53 🔗 ersi Anyhow, it's all good and yes - it's a good idea to write it down, good idea sir
19:54 🔗 yipdw (1) there could be other factors that rely on filesystem semantics that I haven't considered
19:54 🔗 yipdw (2) calm down
19:55 🔗 ersi http://i.qkme.me/35et2u.jpg
20:57 🔗 underscor ovh giveaway in 2 minutes
20:59 🔗 underscor I got one!
20:59 🔗 underscor yay
21:00 🔗 chronomex wheep wheep
21:02 🔗 BlueMaxim ovh?
21:02 🔗 underscor server company
21:05 🔗 ersi Do we really need to link people to Google in here? I'll leave this right here in case anyone has something they don't know and is curious to find out more about: http://google.com (Hint: It's a search engine) :D
21:09 🔗 underscor lol
21:32 🔗 joepie91 oh goddamnit
21:32 🔗 joepie91 missed the giveaway
21:33 🔗 joepie91 ffs
21:45 🔗 underscor joepie91: there's always tomorrow :D
21:46 🔗 joepie91 it's just a bit aggravating, because the *reason* for missing it is a CERTAIN journalist that has been spreading bullshit
21:46 🔗 joepie91 so it's basically someone that I was already mad at, contacting me at this exact point
21:46 🔗 joepie91 making me miss the OVH giveaway on top of that
21:46 🔗 joepie91 :|
21:47 🔗 joepie91 also, pm
21:51 🔗 arrith ovh giveaway eh
21:56 🔗 arrith found it hm
22:25 🔗 dashcloud holy crap that index thing for tabblo is good
22:37 🔗 SmileyG hmmm
22:37 🔗 SmileyG I have no idea what tabblo is :S
22:37 🔗 SmileyG Do I suck?
22:41 🔗 dashcloud nope- it's hard to keep up with all the websites shutting down or shutdown these days
22:50 🔗 joepie91 http://owely.com/4cYUBD
23:00 🔗 chronomex oh sweet, since we made the picplz tracker faster, it looks like fetches have sped up
23:01 🔗 chronomex judging from the graph
23:27 🔗 oli anyone familiar with tracking down processes on a linux box that are using up lots of cpu but are not obvious with top or iotop?
23:30 🔗 chronomex why wouldn't the process in question show up?
23:33 🔗 dashcloud have you tried tree view? (so you can see parent & child processes easily)
23:33 🔗 dashcloud I like htop a lot as a better top
23:35 🔗 oli chronomex: i don't know, and dashcloud yes i have (with htop)
23:35 🔗 chronomex odd
23:35 🔗 oli 09:05:27 up 55 days, 15:47, 1 user, load average: 2.64, 2.75, 2.54
23:35 🔗 oli load has been 2 for days
23:35 🔗 oli the cpu has 4 cores/8 threads so its not killing the box
23:35 🔗 oli its just annoying not knowing what it is
23:36 🔗 dashcloud if you're running a desktop, make sure you check the plugin-container (or the actual plugin) should you have a web browser running
23:36 🔗 oli it's a server
23:37 🔗 dashcloud have you tried sorting it by time? maybe your mystery process will float to the top if it's using a lot of CPU time
23:38 🔗 oli htop
23:38 🔗 chronomex also cumulative time including dead children
23:38 🔗 oli wrong window my bad
23:38 🔗 oli chronomex: how do i get that?
23:39 🔗 chronomex in top, press capital-S
23:39 🔗 oli i tried by time but couldnt see anything, at first i thought it was a minecraft server inside a VPS but i stopped the entire container and its still sitting at 2
23:41 🔗 oli http://pastie.org/private/lkjsanhur7vht3rwrvyvqg
23:41 🔗 GLaDOS load: 50.81 50.23 51.95 49/435 18272
23:41 🔗 oli does that pastie help? :/
23:41 🔗 oli that's by time and cumulative
23:44 🔗 chronomex oli: nothing looks wrong
23:44 🔗 chronomex Cpu(s): 0.6%us, 0.8%sy, 0.0%ni, 97.1%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st
23:44 🔗 joepie91 ohi oli :)
23:44 🔗 chronomex 97% idle, what more do you want
23:45 🔗 oli so why is load sitting at average >2
23:45 🔗 oli and has been for days
23:45 🔗 chronomex looks like you've got two processes in loops that are continually begging for cpu time and then yielding it immediately
23:45 🔗 chronomex dunno why they would do that
23:46 🔗 oli strange
23:46 🔗 Coderjoe iirc, loadavg is related to processes waiting on io in some way
23:47 🔗 oli then iotop should show it?
23:47 🔗 oli htmm
23:47 🔗 oli 809 be/3 root 0.00 B/s 15.19 K/s 0.00 % 28.53 % [jbd2/sda4-8]
23:47 🔗 Coderjoe i would think it would
23:47 🔗 oli ext4 journal
23:47 🔗 oli or some shit
23:47 🔗 chronomex I wouldn't worry about it much if your system is still responsive
23:47 🔗 chronomex but then, I'm the 3rd worst sysadmin in history
23:47 🔗 bsmith094 3rd worst
23:48 🔗 bsmith094 ???
23:48 🔗 oli ok i think its ext4, might reboot with noatime option next
23:48 🔗 oli one of the mysqldbs is working a bit maybe that is causing the journalling to be retarded
23:49 🔗 oli and yes its still responsive so dont care that much chronomex
23:49 🔗 oli :)
23:50 🔗 bsmith094 i have a warrior question. is it expecting to get 80gb to itself or is that reset to something else on startup?
23:51 🔗 DrainLbry If you any of you guys ever needs 3 brand new Que Super Disk 240MB Floppies, sealed new in packaging for whatever crazy ass reason, let me know
23:52 🔗 DrainLbry And on the flip side of that, I've also got the drive that can read them

irclogger-viewer