#archiveteam-bs 2014-05-11,Sun

↑back Search

Time Nickname Message
00:41 🔗 ersi What was it?
00:58 🔗 godane i just want you to know that cbs radio feeds don't have audio files for 2009-07-25 to 2009-07-31
00:59 🔗 godane i also want you guys to know there is a lot of broken or unplayable files for the daily cnn podcast
00:59 🔗 godane in 2010 aways
01:01 🔗 godane also with the cbs radio feeds
01:01 🔗 godane i think it stoped around or just before 8:30PM on 2009-07-24
01:02 🔗 godane good news is i think only one mp3 is missing in 2009-08 files
01:05 🔗 balrog link me one?
01:06 🔗 balrog (of the broken files)
02:25 🔗 godane balrog: http://podcasts.cnn.net/cnn/big/podcasts/cnnnewsroom/video/2010/02/04/the.daily.02.04.cnn.m4v
03:11 🔗 garyrh http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public
11:54 🔗 godane download.cbsnews.com/media/2007/01/28/video2405937.flv
11:54 🔗 godane thats a 60 minutes segment talking about tech support
14:19 🔗 godane i'm grabbing the internet history podcast
15:54 🔗 ohhdemgir https://cdn.mediacru.sh/gGp2y7AcPdcO.jpe
18:02 🔗 godane if anyone has the official linux format dvd for 166 please check its site with this one: https://archive.org/details/cdrom-linuxformatmagazine-166
18:03 🔗 godane i only has cause when mounted it will say there is 4.4gb on the disk
18:03 🔗 godane but its only 2.3gb in size
18:03 🔗 godane also most of linux format dvds are at least 3.9gb
18:10 🔗 ohhdemgir Jan 31 15:32:33 <Schbirid> someone do ftp://gamefiles.blueyonder.co.uk/ - https://archive.org/details/gamefiles.blueyonder.co.uk (only 4 months later..) underscor SketchCow ? That needs sexying up and moving to ftpsites when you get the time (I fluffed the torrent, that can be ignored/removed)
18:40 🔗 ohhdemgir hmm... still new to this.. I take if this derive task will run until it connects to the torrent (which isn't available due to an rtorrent size error) I uploaded the file after with the python package, that finished but the task is waiting to be run... https://catalogd.archive.org/history/gamefiles.blueyonder.co.uk
18:40 🔗 ohhdemgir confusing, murp :3
18:43 🔗 SketchCow Not sure what is happening there, actually.
18:43 🔗 SketchCow Kind of neat.
18:50 🔗 ohhdemgir Did I break it.. XD
18:50 🔗 SketchCow Oh, not sure. Not sure what the torrents do, to be honest.
18:50 🔗 SketchCow How they made them, etc.
18:51 🔗 ohhdemgir Can you cancel the current derive and have it move to the archive task?
18:52 🔗 ersi chfoo is continously checking in great things to URLTeams repository
18:52 🔗 ersi (in the repo "terroroftinytown" that is)
19:02 🔗 DFJustin ohhdemgir: the torrent task should time out eventually and then the other task should run
19:02 🔗 ohhdemgir eventually.. heh
19:02 🔗 ohhdemgir okies :)
19:02 🔗 DFJustin takes at least a day, I forget
19:03 🔗 ohhdemgir when getting .gov sites from the list do I just ignore headers like
19:03 🔗 ohhdemgir Anonymous user logged in.
19:03 🔗 ohhdemgir U.S. Government computer, unauthorized use prohibited by Title 18, U.S.C.
19:03 🔗 ohhdemgir Welcome, ftp, to ftp,cdc,gov
19:05 🔗 DFJustin that's just boiler plate, unauthorized use of any server is prohibited by law
19:05 🔗 DFJustin government or no
19:06 🔗 DFJustin if it's a publicly listed site that allows anonymous login then presumably that use is authorized
19:06 🔗 ohhdemgir sounds good
19:07 🔗 DFJustin the courts can be stupid about that though as demonstrated by the weev and manning cases
19:07 🔗 rocode DFJustin: Unfortunately, as of yet not upheld by the courts.
19:07 🔗 rocode Yeah
19:08 🔗 rocode We had a local case where a kid "hacked" the sheriff website by going to a non-listed URL and downloading records.
19:08 🔗 ohhdemgir I'm in the uk, ripping us sites from a server in france, fuck it
19:12 🔗 nico these country could extradite you to usa
19:12 🔗 ohhdemgir I'll risk it
19:13 🔗 rocode I am in the US. I probably violated six different laws just getting to work this morning.
19:13 🔗 ersi nico: and an asteroid *could* hit you in the head
19:13 🔗 rocode ersi: Bus is much more likely.
19:14 🔗 ersi I said nothing about probability
19:15 🔗 schbirid ohhdemgir: nice work with gamefiles by!
19:15 🔗 ohhdemgir sorry I let it sit for so long
19:16 🔗 schbirid if you consider that long then i don't want to talka bout the fileplanet stuff ever again ;P
19:16 🔗 ohhdemgir .. I ... I still have some of that..
19:16 🔗 ohhdemgir XD
19:16 🔗 godane i'm downloading more cbsnews stuff
19:16 🔗 godane :-D
19:16 🔗 godane some of the stuff in 2007 is very interesting
19:19 🔗 godane is it bad to be uploading 4 things and downloading 4 things at the same time?
19:19 🔗 schbirid ohhdemgir: you must have changed nicks ;)
19:19 🔗 godane i really think my ocd is kicking in today
19:20 🔗 schbirid anyways, if you have leftovers from the id iteration downloading we did, you can safely delete
19:20 🔗 ohhdemgir schbirid, I was tarx or tarxvf before
19:20 🔗 schbirid :)
19:20 🔗 schbirid i am more of a tarxfvz guy
19:21 🔗 ohhdemgir heh
19:22 🔗 rocode I used p7z because I kept forgetting the tar flags.
19:23 🔗 ohhdemgir rocode, that's why I used it as my username XD
19:23 🔗 ohhdemgir I always used p7z before
19:23 🔗 rocode Ah.
19:24 🔗 ohhdemgir schbirid, anysite I should get next?
19:26 🔗 schbirid if you could do anything to turn reddit back to ~2009 and remove all the fucking image macros (are they still called that?) from the web, that would be nice
19:26 🔗 rocode schbirid: I run my own reddit proxy that pretty much does the same thing. That site really went to hell.
19:27 🔗 ohhdemgir I think I have enough data to host reddit as it was in 2009 XD
19:27 🔗 rocode ohhdemgir: It's ~100gb. There was a redditdev backup floating around. Two tables *shudder*
19:28 🔗 ohhdemgir yeah, ish* I have most of it 2007 - early 2013
19:29 🔗 rocode Most reddit data is worthless unless you get their researcher feed, with the amount of fudging they do.
19:30 🔗 ohhdemgir pain in the ass though, last time I put it up admins took it down and asked me to see how they 'wished to handle the release of such data' never heard back, will ia when I get the chance
19:31 🔗 ohhdemgir right now I'm using it to put up things like this http://www.reddit.com/r/AmateurArchives/comments/24vr5r/rgonewild_history_20092013_torrents/
19:31 🔗 ohhdemgir https://archive.org/search.php?query=Gonewild%20Data
19:31 🔗 ohhdemgir because boobies and data, yiss!
19:31 🔗 balrog LOL
19:32 🔗 rocode Reddit admins try to avoid overt backups because of the legal mess their user contributed data is.
19:32 🔗 SketchCow A few more of those have gone down.
19:32 🔗 balrog hah
19:32 🔗 SketchCow Like, a couple albums.
19:32 🔗 rocode They shut down our /r/theoryofreddit bot because we were using old data to try to create a heuristic moderation system.
19:32 🔗 ohhdemgir SketchCow, from the original 220GB one?
19:32 🔗 balrog rocode: wow...........
19:32 🔗 SketchCow Ostensibly, yes
19:32 🔗 balrog the problem you're gonna run into here is that you can't remove a small subset without removing the whole thing
19:33 🔗 ohhdemgir SketchCow, tsk, silly, I'm trying to either not include usernames or release those separately now
19:33 🔗 rocode balrog, communities as a whole go through this cycle constantly. Slashdot saw the same, fark saw the same. When enough money and public interest occurs, things go to hell.
19:34 🔗 SketchCow Awww, it's rocode, our little bucket of reality
19:34 🔗 rocode :(
19:36 🔗 ohhdemgir balrog, true but I feel warm and fuzzy knowing that ia still has it :3
19:36 🔗 balrog would be nice if there was a way to only dark a portion of an archive
19:36 🔗 ohhdemgir agreed
19:36 🔗 rocode SketchCow: Someone has to save all this data to hand over to our AI overlords of the future.
19:36 🔗 SketchCow Is this the part where rocode is going to win me over to archiving maximalism?
19:36 🔗 * SketchCow gets popcorn
19:37 🔗 SketchCow http://www.cbc.ca/strombo/content/images/mj-popcorn.gif
19:37 🔗 rocode Archiving maximalism? Saving everything?
19:37 🔗 SketchCow Regarding the Gonewild Archive situation, your problem is that it's WAY too large and WAY too big for one file.
19:38 🔗 SketchCow It should be, like, 4 items, each with 100 files or so.
19:38 🔗 SketchCow I say this with full 20/20 hindsight.
19:38 🔗 SketchCow I mean, there was no way to know, but now that people are coming out to take issue, it becomes the case.
19:39 🔗 ohhdemgir aye, seems without even linking to it underscor's upload went dark too
19:40 🔗 rocode I think you may be mistaking me with someone else, SketchCow, I was refering to historical voting data and comment history of reddit.
19:40 🔗 ohhdemgir the only way around it is archiving each user as a new item, which is a pain in the ass
19:41 🔗 SketchCow Well, no.
19:41 🔗 SketchCow The way I proposed will make it so the part that needs replacement is much smaller and manageable.
19:42 🔗 SketchCow Right now you have to basically put an aircraft carrier up on blocks to yank a single bolt off the bottom.
19:42 🔗 SketchCow Having to put ONE truck out of a fleet up on blocks is still annoying but comparatively OK.
19:44 🔗 rocode Well, wouldn't it be easier to handle smaller chunks that you can always combine into a large chunk later if needed?
19:44 🔗 ohhdemgir hmm, true ,we'll see when it comes to uploading more
19:44 🔗 ohhdemgir I think there is around 120GB waiting again
19:45 🔗 balrog ohhdemgir: how difficult is it to make a script that splits it?
19:46 🔗 ohhdemgir each user has their own folder, it can be split any old way and still make sense
19:46 🔗 ohhdemgir so, easy
19:46 🔗 balrog then where is this difficult?
19:47 🔗 SketchCow Beyond that, you don't need to split it within users.
19:47 🔗 SketchCow You can just split users.
19:47 🔗 balrog rocode: how do they deal with the people who run the "undelete comment" stuff?
19:47 🔗 ohhdemgir ^ this
19:47 🔗 SketchCow No user's going to be more than a gig.
19:47 🔗 balrog yeah, since a takedown request will nearly always be for at least an entire user
19:47 🔗 ohhdemgir SketchCow, some are 3-5GB
19:47 🔗 SketchCow No user's going to be more than 10 gigs.
19:47 🔗 ohhdemgir lol
19:47 🔗 SketchCow Same difference.
19:47 🔗 rocode No user will need more than 640kb
19:48 🔗 rocode balrog: They leave it up to the submod staff and note it in the reddit ToS as harassment.
19:48 🔗 rocode a.k.a CYA
19:48 🔗 balrog rocode: which part?
19:48 🔗 rocode sec
19:49 🔗 balrog I'm talking about https://www.unedditreddit.com
19:49 🔗 balrog (lol expired cert)
19:50 🔗 rocode balrog: Oh, thought you meant the auto screenshot bot
19:50 🔗 rocode 3rd party sites are 3rd party, therefore they don't care. If it becomes a issue, they ban the IPs from the API.
19:50 🔗 balrog do they use the API or do they scrape?
19:51 🔗 rocode API, AFAIK.
19:52 🔗 rocode Heh, firefox mobile does not allow of temp allow for expired certs.
19:53 🔗 rocode Oh, they are scraping. Those guys got banned from the API.
19:55 🔗 godane SketchCow: you maybe getting a marxist.org section for texts
19:56 🔗 godane i'm trying to upload the pdfs i got from that site
19:59 🔗 rocode godane: Was your download prior to the april 30th purge?
20:00 🔗 godane yes
20:01 🔗 godane i got about 80gb
20:01 🔗 godane but i think it started to redownload stuff so i killed it
20:01 🔗 godane i think it was redownload cause i had -E option in my script
20:02 🔗 godane which makes .html files if there is a folder link
20:02 🔗 godane but the site has alot of .htm files
20:03 🔗 godane so i guess it was redownloading with .htm file
20:03 🔗 godane its better to have the -E option other wise you folder install of folder.html
20:04 🔗 godane *instead
20:04 🔗 godane that way the folder/file.pdf will get download
20:05 🔗 godane other wise it will say folder is can't be wrote to since folder will be a file and folder
20:06 🔗 godane here is the upload item: https://archive.org/details/www.marxists.org-20140426
20:48 🔗 garyrh http://alcatel-lucent.com/bstj/
20:49 🔗 SketchCow garyrh: Already imported into archive.org.
20:49 🔗 garyrh great!
21:51 🔗 exmic failing miserably at reserving a hotel room, apparently
21:51 🔗 exmic who know this was so hard
21:51 🔗 SketchCow Where?
21:52 🔗 SketchCow What are you using?
21:52 🔗 SketchCow I tend to use Kayak these days for the US
21:52 🔗 exmic cool
21:52 🔗 exmic hipmunk told me that something broke and it didn't get reserved
21:52 🔗 exmic some other site said my credit card said "NO"
21:54 🔗 SketchCow CVS Loyalty Cards are not Credit Cards, you know that right
21:54 🔗 exmic hmmm
21:54 🔗 exmic really?
21:54 🔗 SketchCow I know
21:54 🔗 SketchCow I KNOWWWWW
21:54 🔗 SketchCow I was surprised too
21:55 🔗 exmic you used to be able to pay for airplane telephone service with radioshack gift cards
21:55 🔗 SketchCow That was one aaaaangry hooker
21:55 🔗 exmic because theyhad a creditcard like magstripe
21:55 🔗 exmic lol
21:55 🔗 SketchCow Oh yeah, because they couldn't run the thing until you landed
21:55 🔗 SketchCow So always use the seat next to you
21:55 🔗 exmic ding ding ding
21:56 🔗 exmic despite having phones on planes, they couldn't use modems on planes
21:56 🔗 exmic or something
21:56 🔗 SketchCow Props to the hand-wavy cockblaster who pooh-poohed that scenario at the meeting
21:59 🔗 exmic maybe spending money on canadian hotels is not a scenario that my bank envisioned me wanting to do
21:59 🔗 SketchCow I do find you have to call ahead to the bank to get the card opened to that.
21:59 🔗 SketchCow Like, when I was married to a canadian, this came up all the time.
21:59 🔗 SketchCow "I'm going to Canadia, free the card"
22:00 🔗 SketchCow Otherwise I was Mr. Big for dinner and couldn't buy a gum stick the next morning.
22:01 🔗 exmic lol
22:01 🔗 exmic canada, pfeh, who goes there
22:01 🔗 SketchCow My Boston bank would block my card if I bought 4 things in NYC
22:03 🔗 exmic to be fair, new york is really far from boston
22:04 🔗 exmic there aren't even any direct flights
22:06 🔗 exmic hm, what's the state department say about traveling to canada
22:06 🔗 exmic are there any dictatorships there
22:28 🔗 Smiley urgh fucking sensorship

irclogger-viewer