Time |
Nickname |
Message |
00:35
🔗
|
SketchCow |
OK, here we go - uploading 100+ CD-ROMs |
00:39
🔗
|
godane |
cool |
00:46
🔗
|
dashcloud |
awesome! |
00:54
🔗
|
SketchCow |
http://archive.org/search.php?query=collection%3Acdbbsarchive&sort=-publicdate |
00:54
🔗
|
SketchCow |
Watch the fun |
00:54
🔗
|
SketchCow |
they'll pop in like crazy |
00:56
🔗
|
godane |
SketchCow: you should get me access to cdbbsarchive and archiveteam |
00:57
🔗
|
godane |
just so when i add web dumps there not stuck in texts forever |
01:00
🔗
|
dashcloud |
why would web dumps go in cdbbssarchive ? |
01:01
🔗
|
godane |
they go to archiveteam |
01:01
🔗
|
dashcloud |
okay- that makes more sense |
01:01
🔗
|
godane |
access to cdbbsarchive will help me when uploading stuff like twilight series of cds/dvds |
01:18
🔗
|
SketchCow |
There is no situation where you would upload twilight DVDs to that collection. |
01:18
🔗
|
SketchCow |
It's software, godane. |
01:19
🔗
|
godane |
ok |
01:20
🔗
|
godane |
why? |
01:20
🔗
|
godane |
its dvd iso or bin/cue image |
01:20
🔗
|
SketchCow |
Twilight. Like the Twilight movies? |
01:20
🔗
|
godane |
no |
01:20
🔗
|
SketchCow |
Explain to me what they are. Link? |
01:21
🔗
|
SketchCow |
02 |
01:22
🔗
|
godane |
https://archive.org/details/cdrom-twilight-003 |
01:22
🔗
|
godane |
its shareware cds/dvds |
01:24
🔗
|
godane |
i converted the first 5 isos from mdf files |
01:24
🔗
|
godane |
that mdf and mds files are still there too |
01:29
🔗
|
balrog |
keep the mdf/mds, yes |
01:31
🔗
|
godane |
its only the first 15 that i have in mdf/mds format |
01:31
🔗
|
godane |
after that there in iso or bin/cue |
01:31
🔗
|
balrog |
bin/cue is preferred over iso |
01:31
🔗
|
balrog |
but still not great for multisession/etc |
01:37
🔗
|
SketchCow |
OK, you have access to cdbbsarchive |
01:37
🔗
|
godane |
anyways i have uploaded over 44k videos to g4video-web |
01:37
🔗
|
godane |
cool |
01:38
🔗
|
SketchCow |
And archiveteam |
01:41
🔗
|
godane |
great |
01:42
🔗
|
godane |
now i don't have to wait to push my stuff to your collection |
01:42
🔗
|
godane |
*my stuff to be add to your collection |
02:07
🔗
|
Start |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET |
02:07
🔗
|
BlueMax |
"yahooschmahoo" |
02:12
🔗
|
SketchCow |
"yahoosucks" |
02:13
🔗
|
SketchCow |
Ignore Bluemax, we leave him around so the children feel better about themselves |
02:13
🔗
|
SketchCow |
95 CD-ROMs uploaded. 56gb of content. |
02:14
🔗
|
BlueMax |
what collection again? |
02:20
🔗
|
dashcloud |
is there a way to search through the CD's for a particular file name yet? |
02:37
🔗
|
SketchCow |
cdbbsarchive |
02:37
🔗
|
SketchCow |
Absolutely not. |
02:41
🔗
|
SketchCow |
http://archive.org/search.php?query=collection%3Acommodore_c64_books&sort=-publicdate |
02:41
🔗
|
SketchCow |
That will eventually expand out to 95 books |
02:52
🔗
|
godane |
found a 3min interview with john ritter |
03:05
🔗
|
SketchCow |
Now at 27 |
03:05
🔗
|
SketchCow |
Get reading, you'll never catch up. |
03:15
🔗
|
DFJustin |
this pc gamer stuff is all dupes |
03:17
🔗
|
SketchCow |
Is it? I scanned them in myself. |
03:18
🔗
|
SketchCow |
I'm glad there's some dupes, it was getting freaky. |
03:18
🔗
|
godane |
i think i can edit anything |
03:18
🔗
|
SketchCow |
Oh man, godane. |
03:19
🔗
|
SketchCow |
If you went ahead and fixed metadata on any objects in the CD Archive, they'd sing songs about you |
03:19
🔗
|
godane |
your giving me full admin? |
03:19
🔗
|
SketchCow |
You're an admin of that collection, yes. |
03:19
🔗
|
SketchCow |
Delete something and they'll find you in 12 coffee cans. |
03:19
🔗
|
SketchCow |
Let me do that. |
03:19
🔗
|
SketchCow |
But feel free to increase the quality of metadata for that thing. |
03:24
🔗
|
godane |
uploaded: https://archive.org/details/cdrom-twilight-006 |
03:26
🔗
|
godane |
for a few seconds there i thought i could move my stuff to archiveteam |
03:31
🔗
|
SketchCow |
So many Commodore C64 books. |
03:31
🔗
|
SketchCow |
Uploading manuals by the metric ton, too |
03:31
🔗
|
SketchCow |
http://archive.org/details/commodore_c64_manuals |
03:32
🔗
|
SketchCow |
http://archive.org/stream/1541_Flash_Disk_Speedup_for_SX-64_1985_Skyles_Electric_Works#page/n23/mode/2up |
03:32
🔗
|
SketchCow |
now those are clear instructions |
03:32
🔗
|
SketchCow |
not scary at all |
03:33
🔗
|
SketchCow |
Staring into a wire-mass maw of SX64 misery |
03:53
🔗
|
godane |
SketchCow: i uploaded those JumpStart games |
03:53
🔗
|
godane |
JumpStart 1st Grade: https://archive.org/details/JumpStart_1st_Grade |
03:54
🔗
|
godane |
https://archive.org/details/Jumpstart_2nd_Grade |
03:54
🔗
|
godane |
https://archive.org/details/Jumpstart_3rd_Grade |
03:54
🔗
|
godane |
https://archive.org/details/Jumpstart_4th_Grade |
03:55
🔗
|
godane |
https://archive.org/details/Jumpstart_5th_Grade |
03:55
🔗
|
godane |
can these be moved to cdbbsarchive? |
04:18
🔗
|
SketchCow |
Yes |
04:18
🔗
|
SketchCow |
But they're .zips when ISOs are better. |
04:18
🔗
|
SketchCow |
But whatever, get them in |
04:23
🔗
|
godane |
there clonecd images in the zips |
04:24
🔗
|
godane |
it just was 4 files in them and i don't know know how to do more then one file to my s3 upload script |
04:26
🔗
|
SketchCow |
You don't, just do them over and over |
04:27
🔗
|
godane |
oh |
04:28
🔗
|
godane |
i normally use ftp for more then one file |
04:37
🔗
|
godane |
SketchCow: i don't see archiveteam collection when i'm checking in my item |
04:57
🔗
|
SketchCow |
You absolutely have admin access. |
05:07
🔗
|
godane |
i know i do in the software collection |
05:07
🔗
|
godane |
but i don't see web crawls drop down menu or something saying archiveteam |
05:11
🔗
|
godane |
i will test it later with my s3 upload script |
05:16
🔗
|
SketchCow |
Oh, yes. |
05:16
🔗
|
SketchCow |
That's true, you might be fucked there. |
05:18
🔗
|
omf_ |
godane what script are you using to upload? |
05:18
🔗
|
godane |
a custom script i have |
05:26
🔗
|
omf_ |
And it cannot upload more than one file at once? This is why I do development on the open source ias3upload so that everyone can benefit |
05:58
🔗
|
ivan` |
how do I install and run universal-tracker? I am unfamiliar with gem |
06:01
🔗
|
ivan` |
(and do rubyists run all of these tools as root or what?) |
06:35
🔗
|
ivan` |
okay, I did ~/.gem/ruby/1.9.1/bin/bundle install --path=~/.gem; cp config/redis.json.example config/redis.json; rackup |
06:35
🔗
|
ivan` |
that seems to be working |
06:50
🔗
|
ivan` |
so I'm writing the pipeline stuff for greader, is it reasonable to put 1000 unrelated feeds into each warc? |
06:51
🔗
|
ivan` |
most of them will be 404 anyway |
06:51
🔗
|
omf_ |
depends on the output size, most people have slow ass upload speeds |
06:51
🔗
|
ivan` |
it's mostly gzipped json, no images |
06:51
🔗
|
omf_ |
then I don't think it would be a problem |
06:52
🔗
|
omf_ |
I mention it because we constantly see people asking about uploads, not understand that is the part that takes forever not downloading |
06:52
🔗
|
ivan` |
too bad it ain't .warc.lzma2 |
09:10
🔗
|
ivan` |
the inversion of pipeline is really frustrating :/ |
09:10
🔗
|
ivan` |
trying to pass multiple URLs to WgetDownload |
09:29
🔗
|
ersi |
ivan`: Just pass a list/text file with targets? |
09:40
🔗
|
Smiley |
the warc uploads are STILL going for that site godane |
09:43
🔗
|
Smiley |
2013-05-22 22:30:39 (92.3 MB/s) - 'rbelmont.mameworld.info/index.html?feed=rss2&page_id=70' saved [751] |
09:43
🔗
|
Smiley |
FINISHED --2013-05-22 22:30:39-- |
09:44
🔗
|
Smiley |
Total wall clock time: 46m 27s |
09:44
🔗
|
Smiley |
Downloaded: 1495 files, 220M in 7m 5s (530 KB/s) |
09:45
🔗
|
Smiley |
:) |
09:46
🔗
|
omf_ |
Smiley, Baljem did this yesterday https://archive.org/details/rbelmont.mameworld.info.warc |
09:49
🔗
|
ivan` |
https://github.com/ArchiveTeam/greader-grab :-) |
09:50
🔗
|
ivan` |
ersi: considered that, but didn't want to code the cleanup |
09:50
🔗
|
ivan` |
I wrote a ConcatenatedList with a .realize |
09:53
🔗
|
ivan` |
some feeds have a massive history, most are near-empty or 404, hopefully slow people will not get 1000 massive feeds :/ |
09:56
🔗
|
Smiley |
omf_: good good. |
09:56
🔗
|
ersi |
ivan`: Maybe make smaller chunks, just in case? |
09:59
🔗
|
ivan` |
yeah, perhaps 100 or 200 |
10:02
🔗
|
ersi |
sounds good |
10:32
🔗
|
ivan` |
how in the world does this work in posterous-grab? id_function=(lambda item: {"ua": item["user_agent"] }) |
10:33
🔗
|
ivan` |
my item is lacking this user_agent in my pipeline |
10:34
🔗
|
ersi |
The tracker for posterous dishes out user-agents.. and that's where it was introduced first |
10:34
🔗
|
ersi |
It's for overriding the useragent in the WgetDownload |
10:35
🔗
|
ivan` |
oh man, I didn't even realize the tracker could dish out json |
10:36
🔗
|
ivan` |
I suppose I should stop doing my string splitting now |
10:36
🔗
|
ivan` |
(unless adding jobs with item_name and some extra unique data is hard?) |
10:42
🔗
|
ivan` |
right now my items are 00000001|feed1`feed2`... |
10:45
🔗
|
ersi |
Why not just give them a name? Like task1 2 3 etc |
10:48
🔗
|
ivan` |
I don't understand :) |
10:50
🔗
|
ivan` |
instead of doing the string splitting stuff, should I write a program for inserting a job like {"item_name": "0000001", "feed_urls": [...]} into redis? |
10:51
🔗
|
ersi |
I think so, but I'm not certain.. I think alard's the man |
17:58
🔗
|
SketchCow |
Hey godane |
19:25
🔗
|
godane |
SketchCow: hey |
19:25
🔗
|
SketchCow |
Hey. |
19:25
🔗
|
SketchCow |
So, in the future, for texts |
19:25
🔗
|
SketchCow |
Please enter subjects as subject[1] subject[2] and so on, not as just subject |
19:26
🔗
|
SketchCow |
Small bug with s3, and how it works |
19:26
🔗
|
SketchCow |
if you do subject=keyword1;keyword2, it just makes a single subject "keyword1;keyword2" instead of the right thing. |
19:26
🔗
|
SketchCow |
This is just for texts. |
19:26
🔗
|
SketchCow |
Got it? |
19:26
🔗
|
SketchCow |
It'll help going forward |
19:27
🔗
|
godane |
this is a problem with texts keywords |
19:27
🔗
|
godane |
ok |
19:27
🔗
|
alard |
ivan`: Haven't looked at your pipeline code, but if you need more details, perhaps #warrior is a useful channel for discussion? |
19:29
🔗
|
godane |
SketchCow: i don't have this problem here: |
19:29
🔗
|
godane |
https://archive.org/details/g4tv.com-hdvideo-xml-20130228 |
19:37
🔗
|
SketchCow |
Yes, that's video. |
19:37
🔗
|
SketchCow |
mediatype movies |
19:37
🔗
|
SketchCow |
When it's mediatype texts, the problem shows. |
19:38
🔗
|
godane |
that is a web dump |
19:38
🔗
|
godane |
mediatype is texts |
19:38
🔗
|
godane |
collection is opensource |
19:38
🔗
|
godane |
:P |
19:39
🔗
|
SketchCow |
This is what has been passed to me. |
19:39
🔗
|
SketchCow |
http://archive.org/details/Loadstar-Letter-Issue-45 is the example he used. |
19:40
🔗
|
godane |
how do you fix that problem? |
19:43
🔗
|
godane |
i tryed removing it and adding it back in |
19:43
🔗
|
SketchCow |
15:27 <@SketchCow> Please enter subjects as subject[1] subject[2] and so on, not as just subject |
19:43
🔗
|
godane |
how do you do that in web edit? |
19:44
🔗
|
godane |
there is only subject |
19:44
🔗
|
SketchCow |
Add at the bottom |
19:44
🔗
|
Nemo_bis |
adding another field |
19:50
🔗
|
godane |
ok this is weird |
19:50
🔗
|
godane |
it doesn't cause anyproblems in search pages |
19:52
🔗
|
godane |
also looks like i only add key words to loadstar letter like this |
19:53
🔗
|
godane |
i think i didn't add any key works to stuff like my sandhills publishing pdfs i uploaded |
19:53
🔗
|
godane |
*words |
19:54
🔗
|
SketchCow |
Great. |
19:55
🔗
|
godane |
so this bug looks to be limited to the page of the item |
19:56
🔗
|
godane |
not search results |
19:56
🔗
|
godane |
so its not a very big bug |
19:59
🔗
|
Nemo_bis |
not being able to click a keyword is annoying |
19:59
🔗
|
godane |
i know |
19:59
🔗
|
godane |
but this is in a collection |
20:00
🔗
|
godane |
so its not has bad has it could be |
20:03
🔗
|
godane |
http://developers.slashdot.org/story/13/05/23/1752201/google-code-deprecates-download-service-for-project-hosting |
20:09
🔗
|
balrog |
what... why |
20:10
🔗
|
godane |
figure guys should get one that |
20:11
🔗
|
godane |
*on that |
21:27
🔗
|
ivan` |
alard: thanks, didn't know that existed |
21:27
🔗
|
Smiley |
Pouet still going; |
21:28
🔗
|
Smiley |
newAmerica is now on the dedibox, and I need to write up the metadata for it |
21:38
🔗
|
Smiley |
And it's uploading :) This is gonna take _AWHILE_ |
22:23
🔗
|
Smiley |
godane: on warc 13..... I doubt by the morning it'll even be finished :D |
23:14
🔗
|
wp494 |
in the event that I need to suspend a warrior when it's uploading something, will the upload be fucked or will things be normal upon resume? |
23:16
🔗
|
Smiley |
most likely resume |
23:16
🔗
|
Smiley |
else it'll just start again |
23:16
🔗
|
Smiley |
Depends if it ends up connecting ot the same server on resume, and if the resume data is intact. |
23:28
🔗
|
S[h]O[r]T |
i think its just fos right now where everything is being pushed |
23:29
🔗
|
Smiley |
cool |
23:50
🔗
|
SketchCow |
So, Brewster asked me if one of you wanted to take a shot at a pretty weird little project. |
23:50
🔗
|
SketchCow |
If you want to fuck with WARCs, let me know |
23:57
🔗
|
omf_ |
SketchCow, I already fuck with warcs |
23:58
🔗
|
omf_ |
If it is programming I am all ears |
23:58
🔗
|
SketchCow |
Take a WARC, download it, rip out a collection of "interesting" GIFs, JPGs and PNGs, and make a collage. |
23:58
🔗
|
SketchCow |
Write it so "make a collage" can be different things. |
23:58
🔗
|
omf_ |
Do you mean different file types for the collage or different image types |
23:59
🔗
|
SketchCow |
Sorry. |
23:59
🔗
|
SketchCow |
WARC --> Rip Images --> Make Collage |
23:59
🔗
|
SketchCow |
WARC --> Rip Images --> Make Gallery |
23:59
🔗
|
SketchCow |
WARC --> Rip Images --> Rotating Head of Bear Roaring out GIFs |
23:59
🔗
|
omf_ |
Okay |