Time |
Nickname |
Message |
00:41
🔗
|
ersi |
What was it? |
00:58
🔗
|
godane |
i just want you to know that cbs radio feeds don't have audio files for 2009-07-25 to 2009-07-31 |
00:59
🔗
|
godane |
i also want you guys to know there is a lot of broken or unplayable files for the daily cnn podcast |
00:59
🔗
|
godane |
in 2010 aways |
01:01
🔗
|
godane |
also with the cbs radio feeds |
01:01
🔗
|
godane |
i think it stoped around or just before 8:30PM on 2009-07-24 |
01:02
🔗
|
godane |
good news is i think only one mp3 is missing in 2009-08 files |
01:05
🔗
|
balrog |
link me one? |
01:06
🔗
|
balrog |
(of the broken files) |
02:25
🔗
|
godane |
balrog: http://podcasts.cnn.net/cnn/big/podcasts/cnnnewsroom/video/2010/02/04/the.daily.02.04.cnn.m4v |
03:11
🔗
|
garyrh |
http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public |
11:54
🔗
|
godane |
download.cbsnews.com/media/2007/01/28/video2405937.flv |
11:54
🔗
|
godane |
thats a 60 minutes segment talking about tech support |
14:19
🔗
|
godane |
i'm grabbing the internet history podcast |
15:54
🔗
|
ohhdemgir |
https://cdn.mediacru.sh/gGp2y7AcPdcO.jpe |
18:02
🔗
|
godane |
if anyone has the official linux format dvd for 166 please check its site with this one: https://archive.org/details/cdrom-linuxformatmagazine-166 |
18:03
🔗
|
godane |
i only has cause when mounted it will say there is 4.4gb on the disk |
18:03
🔗
|
godane |
but its only 2.3gb in size |
18:03
🔗
|
godane |
also most of linux format dvds are at least 3.9gb |
18:10
🔗
|
ohhdemgir |
Jan 31 15:32:33 <Schbirid> someone do ftp://gamefiles.blueyonder.co.uk/ - https://archive.org/details/gamefiles.blueyonder.co.uk (only 4 months later..) underscor SketchCow ? That needs sexying up and moving to ftpsites when you get the time (I fluffed the torrent, that can be ignored/removed) |
18:40
🔗
|
ohhdemgir |
hmm... still new to this.. I take if this derive task will run until it connects to the torrent (which isn't available due to an rtorrent size error) I uploaded the file after with the python package, that finished but the task is waiting to be run... https://catalogd.archive.org/history/gamefiles.blueyonder.co.uk |
18:40
🔗
|
ohhdemgir |
confusing, murp :3 |
18:43
🔗
|
SketchCow |
Not sure what is happening there, actually. |
18:43
🔗
|
SketchCow |
Kind of neat. |
18:50
🔗
|
ohhdemgir |
Did I break it.. XD |
18:50
🔗
|
SketchCow |
Oh, not sure. Not sure what the torrents do, to be honest. |
18:50
🔗
|
SketchCow |
How they made them, etc. |
18:51
🔗
|
ohhdemgir |
Can you cancel the current derive and have it move to the archive task? |
18:52
🔗
|
ersi |
chfoo is continously checking in great things to URLTeams repository |
18:52
🔗
|
ersi |
(in the repo "terroroftinytown" that is) |
19:02
🔗
|
DFJustin |
ohhdemgir: the torrent task should time out eventually and then the other task should run |
19:02
🔗
|
ohhdemgir |
eventually.. heh |
19:02
🔗
|
ohhdemgir |
okies :) |
19:02
🔗
|
DFJustin |
takes at least a day, I forget |
19:03
🔗
|
ohhdemgir |
when getting .gov sites from the list do I just ignore headers like |
19:03
🔗
|
ohhdemgir |
Anonymous user logged in. |
19:03
🔗
|
ohhdemgir |
U.S. Government computer, unauthorized use prohibited by Title 18, U.S.C. |
19:03
🔗
|
ohhdemgir |
Welcome, ftp, to ftp,cdc,gov |
19:05
🔗
|
DFJustin |
that's just boiler plate, unauthorized use of any server is prohibited by law |
19:05
🔗
|
DFJustin |
government or no |
19:06
🔗
|
DFJustin |
if it's a publicly listed site that allows anonymous login then presumably that use is authorized |
19:06
🔗
|
ohhdemgir |
sounds good |
19:07
🔗
|
DFJustin |
the courts can be stupid about that though as demonstrated by the weev and manning cases |
19:07
🔗
|
rocode |
DFJustin: Unfortunately, as of yet not upheld by the courts. |
19:07
🔗
|
rocode |
Yeah |
19:08
🔗
|
rocode |
We had a local case where a kid "hacked" the sheriff website by going to a non-listed URL and downloading records. |
19:08
🔗
|
ohhdemgir |
I'm in the uk, ripping us sites from a server in france, fuck it |
19:12
🔗
|
nico |
these country could extradite you to usa |
19:12
🔗
|
ohhdemgir |
I'll risk it |
19:13
🔗
|
rocode |
I am in the US. I probably violated six different laws just getting to work this morning. |
19:13
🔗
|
ersi |
nico: and an asteroid *could* hit you in the head |
19:13
🔗
|
rocode |
ersi: Bus is much more likely. |
19:14
🔗
|
ersi |
I said nothing about probability |
19:15
🔗
|
schbirid |
ohhdemgir: nice work with gamefiles by! |
19:15
🔗
|
ohhdemgir |
sorry I let it sit for so long |
19:16
🔗
|
schbirid |
if you consider that long then i don't want to talka bout the fileplanet stuff ever again ;P |
19:16
🔗
|
ohhdemgir |
.. I ... I still have some of that.. |
19:16
🔗
|
ohhdemgir |
XD |
19:16
🔗
|
godane |
i'm downloading more cbsnews stuff |
19:16
🔗
|
godane |
:-D |
19:16
🔗
|
godane |
some of the stuff in 2007 is very interesting |
19:19
🔗
|
godane |
is it bad to be uploading 4 things and downloading 4 things at the same time? |
19:19
🔗
|
schbirid |
ohhdemgir: you must have changed nicks ;) |
19:19
🔗
|
godane |
i really think my ocd is kicking in today |
19:20
🔗
|
schbirid |
anyways, if you have leftovers from the id iteration downloading we did, you can safely delete |
19:20
🔗
|
ohhdemgir |
schbirid, I was tarx or tarxvf before |
19:20
🔗
|
schbirid |
:) |
19:20
🔗
|
schbirid |
i am more of a tarxfvz guy |
19:21
🔗
|
ohhdemgir |
heh |
19:22
🔗
|
rocode |
I used p7z because I kept forgetting the tar flags. |
19:23
🔗
|
ohhdemgir |
rocode, that's why I used it as my username XD |
19:23
🔗
|
ohhdemgir |
I always used p7z before |
19:23
🔗
|
rocode |
Ah. |
19:24
🔗
|
ohhdemgir |
schbirid, anysite I should get next? |
19:26
🔗
|
schbirid |
if you could do anything to turn reddit back to ~2009 and remove all the fucking image macros (are they still called that?) from the web, that would be nice |
19:26
🔗
|
rocode |
schbirid: I run my own reddit proxy that pretty much does the same thing. That site really went to hell. |
19:27
🔗
|
ohhdemgir |
I think I have enough data to host reddit as it was in 2009 XD |
19:27
🔗
|
rocode |
ohhdemgir: It's ~100gb. There was a redditdev backup floating around. Two tables *shudder* |
19:28
🔗
|
ohhdemgir |
yeah, ish* I have most of it 2007 - early 2013 |
19:29
🔗
|
rocode |
Most reddit data is worthless unless you get their researcher feed, with the amount of fudging they do. |
19:30
🔗
|
ohhdemgir |
pain in the ass though, last time I put it up admins took it down and asked me to see how they 'wished to handle the release of such data' never heard back, will ia when I get the chance |
19:31
🔗
|
ohhdemgir |
right now I'm using it to put up things like this http://www.reddit.com/r/AmateurArchives/comments/24vr5r/rgonewild_history_20092013_torrents/ |
19:31
🔗
|
ohhdemgir |
https://archive.org/search.php?query=Gonewild%20Data |
19:31
🔗
|
ohhdemgir |
because boobies and data, yiss! |
19:31
🔗
|
balrog |
LOL |
19:32
🔗
|
rocode |
Reddit admins try to avoid overt backups because of the legal mess their user contributed data is. |
19:32
🔗
|
SketchCow |
A few more of those have gone down. |
19:32
🔗
|
balrog |
hah |
19:32
🔗
|
SketchCow |
Like, a couple albums. |
19:32
🔗
|
rocode |
They shut down our /r/theoryofreddit bot because we were using old data to try to create a heuristic moderation system. |
19:32
🔗
|
ohhdemgir |
SketchCow, from the original 220GB one? |
19:32
🔗
|
balrog |
rocode: wow........... |
19:32
🔗
|
SketchCow |
Ostensibly, yes |
19:32
🔗
|
balrog |
the problem you're gonna run into here is that you can't remove a small subset without removing the whole thing |
19:33
🔗
|
ohhdemgir |
SketchCow, tsk, silly, I'm trying to either not include usernames or release those separately now |
19:33
🔗
|
rocode |
balrog, communities as a whole go through this cycle constantly. Slashdot saw the same, fark saw the same. When enough money and public interest occurs, things go to hell. |
19:34
🔗
|
SketchCow |
Awww, it's rocode, our little bucket of reality |
19:34
🔗
|
rocode |
:( |
19:36
🔗
|
ohhdemgir |
balrog, true but I feel warm and fuzzy knowing that ia still has it :3 |
19:36
🔗
|
balrog |
would be nice if there was a way to only dark a portion of an archive |
19:36
🔗
|
ohhdemgir |
agreed |
19:36
🔗
|
rocode |
SketchCow: Someone has to save all this data to hand over to our AI overlords of the future. |
19:36
🔗
|
SketchCow |
Is this the part where rocode is going to win me over to archiving maximalism? |
19:36
🔗
|
* |
SketchCow gets popcorn |
19:37
🔗
|
SketchCow |
http://www.cbc.ca/strombo/content/images/mj-popcorn.gif |
19:37
🔗
|
rocode |
Archiving maximalism? Saving everything? |
19:37
🔗
|
SketchCow |
Regarding the Gonewild Archive situation, your problem is that it's WAY too large and WAY too big for one file. |
19:38
🔗
|
SketchCow |
It should be, like, 4 items, each with 100 files or so. |
19:38
🔗
|
SketchCow |
I say this with full 20/20 hindsight. |
19:38
🔗
|
SketchCow |
I mean, there was no way to know, but now that people are coming out to take issue, it becomes the case. |
19:39
🔗
|
ohhdemgir |
aye, seems without even linking to it underscor's upload went dark too |
19:40
🔗
|
rocode |
I think you may be mistaking me with someone else, SketchCow, I was refering to historical voting data and comment history of reddit. |
19:40
🔗
|
ohhdemgir |
the only way around it is archiving each user as a new item, which is a pain in the ass |
19:41
🔗
|
SketchCow |
Well, no. |
19:41
🔗
|
SketchCow |
The way I proposed will make it so the part that needs replacement is much smaller and manageable. |
19:42
🔗
|
SketchCow |
Right now you have to basically put an aircraft carrier up on blocks to yank a single bolt off the bottom. |
19:42
🔗
|
SketchCow |
Having to put ONE truck out of a fleet up on blocks is still annoying but comparatively OK. |
19:44
🔗
|
rocode |
Well, wouldn't it be easier to handle smaller chunks that you can always combine into a large chunk later if needed? |
19:44
🔗
|
ohhdemgir |
hmm, true ,we'll see when it comes to uploading more |
19:44
🔗
|
ohhdemgir |
I think there is around 120GB waiting again |
19:45
🔗
|
balrog |
ohhdemgir: how difficult is it to make a script that splits it? |
19:46
🔗
|
ohhdemgir |
each user has their own folder, it can be split any old way and still make sense |
19:46
🔗
|
ohhdemgir |
so, easy |
19:46
🔗
|
balrog |
then where is this difficult? |
19:47
🔗
|
SketchCow |
Beyond that, you don't need to split it within users. |
19:47
🔗
|
SketchCow |
You can just split users. |
19:47
🔗
|
balrog |
rocode: how do they deal with the people who run the "undelete comment" stuff? |
19:47
🔗
|
ohhdemgir |
^ this |
19:47
🔗
|
SketchCow |
No user's going to be more than a gig. |
19:47
🔗
|
balrog |
yeah, since a takedown request will nearly always be for at least an entire user |
19:47
🔗
|
ohhdemgir |
SketchCow, some are 3-5GB |
19:47
🔗
|
SketchCow |
No user's going to be more than 10 gigs. |
19:47
🔗
|
ohhdemgir |
lol |
19:47
🔗
|
SketchCow |
Same difference. |
19:47
🔗
|
rocode |
No user will need more than 640kb |
19:48
🔗
|
rocode |
balrog: They leave it up to the submod staff and note it in the reddit ToS as harassment. |
19:48
🔗
|
rocode |
a.k.a CYA |
19:48
🔗
|
balrog |
rocode: which part? |
19:48
🔗
|
rocode |
sec |
19:49
🔗
|
balrog |
I'm talking about https://www.unedditreddit.com |
19:49
🔗
|
balrog |
(lol expired cert) |
19:50
🔗
|
rocode |
balrog: Oh, thought you meant the auto screenshot bot |
19:50
🔗
|
rocode |
3rd party sites are 3rd party, therefore they don't care. If it becomes a issue, they ban the IPs from the API. |
19:50
🔗
|
balrog |
do they use the API or do they scrape? |
19:51
🔗
|
rocode |
API, AFAIK. |
19:52
🔗
|
rocode |
Heh, firefox mobile does not allow of temp allow for expired certs. |
19:53
🔗
|
rocode |
Oh, they are scraping. Those guys got banned from the API. |
19:55
🔗
|
godane |
SketchCow: you maybe getting a marxist.org section for texts |
19:56
🔗
|
godane |
i'm trying to upload the pdfs i got from that site |
19:59
🔗
|
rocode |
godane: Was your download prior to the april 30th purge? |
20:00
🔗
|
godane |
yes |
20:01
🔗
|
godane |
i got about 80gb |
20:01
🔗
|
godane |
but i think it started to redownload stuff so i killed it |
20:01
🔗
|
godane |
i think it was redownload cause i had -E option in my script |
20:02
🔗
|
godane |
which makes .html files if there is a folder link |
20:02
🔗
|
godane |
but the site has alot of .htm files |
20:03
🔗
|
godane |
so i guess it was redownloading with .htm file |
20:03
🔗
|
godane |
its better to have the -E option other wise you folder install of folder.html |
20:04
🔗
|
godane |
*instead |
20:04
🔗
|
godane |
that way the folder/file.pdf will get download |
20:05
🔗
|
godane |
other wise it will say folder is can't be wrote to since folder will be a file and folder |
20:06
🔗
|
godane |
here is the upload item: https://archive.org/details/www.marxists.org-20140426 |
20:48
🔗
|
garyrh |
http://alcatel-lucent.com/bstj/ |
20:49
🔗
|
SketchCow |
garyrh: Already imported into archive.org. |
20:49
🔗
|
garyrh |
great! |
21:51
🔗
|
exmic |
failing miserably at reserving a hotel room, apparently |
21:51
🔗
|
exmic |
who know this was so hard |
21:51
🔗
|
SketchCow |
Where? |
21:52
🔗
|
SketchCow |
What are you using? |
21:52
🔗
|
SketchCow |
I tend to use Kayak these days for the US |
21:52
🔗
|
exmic |
cool |
21:52
🔗
|
exmic |
hipmunk told me that something broke and it didn't get reserved |
21:52
🔗
|
exmic |
some other site said my credit card said "NO" |
21:54
🔗
|
SketchCow |
CVS Loyalty Cards are not Credit Cards, you know that right |
21:54
🔗
|
exmic |
hmmm |
21:54
🔗
|
exmic |
really? |
21:54
🔗
|
SketchCow |
I know |
21:54
🔗
|
SketchCow |
I KNOWWWWW |
21:54
🔗
|
SketchCow |
I was surprised too |
21:55
🔗
|
exmic |
you used to be able to pay for airplane telephone service with radioshack gift cards |
21:55
🔗
|
SketchCow |
That was one aaaaangry hooker |
21:55
🔗
|
exmic |
because theyhad a creditcard like magstripe |
21:55
🔗
|
exmic |
lol |
21:55
🔗
|
SketchCow |
Oh yeah, because they couldn't run the thing until you landed |
21:55
🔗
|
SketchCow |
So always use the seat next to you |
21:55
🔗
|
exmic |
ding ding ding |
21:56
🔗
|
exmic |
despite having phones on planes, they couldn't use modems on planes |
21:56
🔗
|
exmic |
or something |
21:56
🔗
|
SketchCow |
Props to the hand-wavy cockblaster who pooh-poohed that scenario at the meeting |
21:59
🔗
|
exmic |
maybe spending money on canadian hotels is not a scenario that my bank envisioned me wanting to do |
21:59
🔗
|
SketchCow |
I do find you have to call ahead to the bank to get the card opened to that. |
21:59
🔗
|
SketchCow |
Like, when I was married to a canadian, this came up all the time. |
21:59
🔗
|
SketchCow |
"I'm going to Canadia, free the card" |
22:00
🔗
|
SketchCow |
Otherwise I was Mr. Big for dinner and couldn't buy a gum stick the next morning. |
22:01
🔗
|
exmic |
lol |
22:01
🔗
|
exmic |
canada, pfeh, who goes there |
22:01
🔗
|
SketchCow |
My Boston bank would block my card if I bought 4 things in NYC |
22:03
🔗
|
exmic |
to be fair, new york is really far from boston |
22:04
🔗
|
exmic |
there aren't even any direct flights |
22:06
🔗
|
exmic |
hm, what's the state department say about traveling to canada |
22:06
🔗
|
exmic |
are there any dictatorships there |
22:28
🔗
|
Smiley |
urgh fucking sensorship |