#archiveteam 2012-06-11,Mon

↑back Search

Time Nickname Message
01:54 🔗 Coderjoe I haven't looked at what is written in github yet, but this might be useful: http://multimedia.cx/eggs/multiprocess-fate-revisited/
03:42 🔗 Coderjoe grr
03:44 🔗 chronomex hi
03:44 🔗 chronomex sup
04:17 🔗 S[h]O[r]T hi
04:26 🔗 Dvorak heyy
04:31 🔗 Dvorak what you up to S[h]O[r]T
06:05 🔗 S[h]O[r]T not much was playing battlefield 3, sleeping now
06:05 🔗 S[h]O[r]T starting doing some memac stuff again
06:47 🔗 SketchCow MORNING
06:49 🔗 Dvorak morning
06:49 🔗 Dvorak SketchCow, quick pm?
08:08 🔗 BlueMaxim good morning SketchCow
13:44 🔗 underscor http://archive.org/stats/s3.php some beta s3 stats
13:44 🔗 underscor pretty colors :D
13:49 🔗 ersi_ mr leaky man
13:49 🔗 BlueMaxim leakyscor
13:50 🔗 ersi_ pretty colours indeed
14:11 🔗 godane 11 episodes of crankygeeks to go
14:12 🔗 ersi haha, great name
14:14 🔗 * SmileyG ponders if he can watch at work
16:46 🔗 Schbirid does wget-bzr save the commandline used to the warc file?
16:46 🔗 Schbirid might be handy information
16:47 🔗 ersi If you're saving to a WARC, then yes. It adds a WARC-header, if you do less on the WARC you'll see it (There's a lot of text. Each data you save has a header, which is plain text - even if the data is gzipped)
16:48 🔗 Schbirid awesome
16:49 🔗 Schbirid heh, eeek. now it will be forever known that it was me who mirrored the gamespy forums
16:49 🔗 Schbirid i used a user-agent with contact address
16:50 🔗 ersi Well, the WARC-header part isn't displayed publically if you use a tool like Wayback, warc-proxy or such
16:51 🔗 Schbirid yeah, i am actually fine with it
16:51 🔗 ersi I bet it's scrubbable - but I recommend always setting the user agent by yourself - just in case :)
16:51 🔗 Schbirid the only worry would be gamespy being angry
16:51 🔗 Schbirid but IGN is so dumb, they probably wont even notice
17:38 🔗 SketchCow HEY GANG
17:38 🔗 SketchCow OK, great talk/keynote given
17:38 🔗 SketchCow Did us well.
17:38 🔗 SketchCow Favorite line: "We are not hackers, we're just very, veyr energetic."
17:39 🔗 SketchCow Anyone look at Meeblo yet?
17:42 🔗 SketchCow Meebo, I mean.
17:44 🔗 balrog- SketchCow: legalize was looking for you in #discferret
18:14 🔗 godane SketchCow: I uploaded all crankygeeks onto archive.org
19:50 🔗 Schbirid night
20:48 🔗 underscor Hmm, anyone actually gotten their OVH giveaway server?
20:50 🔗 ersi underscor: yeah, Schbirid actually has one
20:51 🔗 underscor hmm
20:51 🔗 underscor wonder why mine's taking so long
20:56 🔗 chronomex they know what you're intending to do with it
21:21 🔗 SketchCow WHERE'S THE HUGS
21:25 🔗 Dvorak The hugs are in a bag over there, not many left though grab em while you can.....
21:25 🔗 Dvorak also, "site:fileplanet.com inurl:hosteddl" returns 6400 results, seems small, is that really all we need...
21:25 🔗 SketchCow Oh god, bag hugs
21:27 🔗 Dvorak 4300*, hmm
21:35 🔗 SketchCow http://lanlsource.lanl.gov/hello is interesting
21:42 🔗 chronomex hrm
21:56 🔗 Dvorak Anyone here working on the fileplanet archive?
22:36 🔗 dashcloud SketchCow: did you ever do a post on your workflow for capturing VHS tapes for GDC? I'm doing some now, and I would be interested to know what your workflow was
22:46 🔗 Coderjoe how about using: inurl:fileplanet.com inurl:hosteddl
22:46 🔗 Coderjoe unfortunately, google tailors search results to the user, as well as gives different results depending on what cluster you ask
22:49 🔗 Dvorak Coderjoe, that gives roughly the same give or take a few results
22:50 🔗 Coderjoe i was given "about 5460" results. they might vary a bit compared to yours, however
22:52 🔗 Dvorak yeah that threw me a little too, see http://support.google.com/webmasters/bin/answer.py?hl=en&answer=70920
22:53 🔗 Dvorak i managed to get only 550 actual results without dupes, seems there no long indexing as many as 4000+ may have to find another way to get the list as I'm sure there are more than 550 files hosted
22:53 🔗 Coderjoe it would be nice if there were a way to tell google search that you don't care about speed
22:54 🔗 Dvorak yeah, they dont let you list more than 100 result per page and and that's only due to speed, silly google
22:55 🔗 Coderjoe whee
22:55 🔗 Coderjoe their help no longer matches actual behavior, it seems
22:56 🔗 Coderjoe http://support.google.com/websearch/bin/answer.py?hl=en&p=adv_sitespecific&answer=1734233
22:56 🔗 Coderjoe link: does not appear to work. google asks me "did you mean?" and shows me other crap
22:59 🔗 Coderjoe oh, and NOW it works
22:59 🔗 Coderjoe but doesn't show anything
22:59 🔗 Dvorak google overall is becoming a little too user friendly, I haven't really taken notice of the changes in functionality until now.
23:00 🔗 Coderjoe link:fileplanet.com/hosteddl shows no results
23:00 🔗 Coderjoe yes
23:01 🔗 Dvorak nope, none
23:01 🔗 Coderjoe I have been hating on google for over a year. it seems impossible to find anything I'm looking for anymore
23:01 🔗 Dvorak hmm, we'll see what Schbirid says when he gets up, i sent him the results but I'm still convinced there should be more. coffee is needed, brb
23:01 🔗 instence_ *slams head* argghhhh
23:02 🔗 instence_ drives me nuts when i run into stuff like this
23:03 🔗 instence_ entire news backlog increments backwards in 2 day iterations, no pagination, and only selectbox form for navigating the archive
23:03 🔗 instence_ so now i have to build a 2 day incrementing list of links that dates back 13 years
23:05 🔗 Coderjoe https://duckduckgo.com/?q=site%3Afileplanet.com%20inurl%3Ahosteddl&kp=-1
23:06 🔗 instence_ fyi on fileplanet: most of the results are uncrawlable
23:07 🔗 instence_ so whats coming up on search engines or fileplanet itself are just a fraction of whats really there
23:07 🔗 instence_ because they tried to redo fileplanet but just abandonded the project
23:08 🔗 instence_ if you try to navigate to Q3, in fileplanet itself, theres barely anything that comes up
23:08 🔗 instence_ just a few maps, a few skins etc
23:08 🔗 instence_ its like the DB is incomplete
23:09 🔗 instence_ but, the files are still accessible, if you can find the links to them
23:10 🔗 instence_ its a real mess
23:11 🔗 Dvorak instence_, my thought exactly, but I'm still not sure why... gahh, looks like I landed on a frustrating project =/
23:11 🔗 Coderjoe the general files have been downloaded, afaicr. what we're trying to get are hosted downloads
23:11 🔗 Coderjoe general files have been handled by just incrementing through the fileID space
23:13 🔗 Coderjoe another thing, other than user content, that we could probably help with as a whole: drivers. for example, abit is gone. if someone should need drivers for one of their boards, where do they turn?
23:14 🔗 instence_ okay so fileplanet.com/1000/download
23:14 🔗 instence_ you just incremented through that?
23:14 🔗 instence_ (typing with just my left hand, right wrist broken)
23:15 🔗 Coderjoe yes, that's what they did, iirc
23:15 🔗 instence_ ok cool
23:15 🔗 instence_ i'm going to trying and poke at this via another route to have some more specific archives with more context
23:16 🔗 instence_ like, if you go to planetquake, the files section, those are all hosted in fileplanet, but the files section at PQ has juicy info about each file
23:16 🔗 Coderjoe actually, I think they hit the fileinfo pages as well for the metadata
23:17 🔗 instence_ how does one view that off FPok cool
23:17 🔗 instence_ er sorry
23:17 🔗 instence_ ok cool, sounds good
23:19 🔗 instence_ i havn't had much time to look into the FP contect, since i'm barely picking up where i left off on a project i started last january, but god derailed due to life issues
23:19 🔗 instence_ FP content*
23:20 🔗 Coderjoe life sucks
23:21 🔗 Coderjoe always getting in the way
23:21 🔗 instence_ yea its hard to even motivate myself to work on tasks. i should be working on fixing life problems, but arm is broken and jack shit i can do *sigh*
23:22 🔗 instence_ but im forcing myself to work on some projects anyway, since i know eventually i would want to finish them when my state of mind is in a better place
23:23 🔗 instence_ and the data may not be around by the time that happens
23:23 🔗 instence_ so i'm sitting here archiving, cursing the universe at the same time
23:25 🔗 instence_ so lets say there is this file:
23:25 🔗 instence_ http://www.fileplanet.com/37595/download/
23:25 🔗 instence_ what would the file info url be for it?
23:26 🔗 instence_ oh wait i think i see
23:26 🔗 instence_ bingo got it
23:26 🔗 instence_ http://www.fileplanet.com/37595/30000/fileinfo/
23:27 🔗 instence_ so if the id is 25689, the second number has to be 20000

irclogger-viewer