Time |
Nickname |
Message |
00:11
🔗
|
aaaaaaaaa |
bebzol: you could download the tracker dev env and set up a network on virtualbox for it and just change the tracker_host in your pipeline.py file |
00:14
🔗
|
aaaaaaaaa |
https://github.com/ArchiveTeam/archiveteam-dev-env or follow the directions here: http://archiveteam.org/index.php?title=Dev/Tracker |
00:20
🔗
|
bebzol |
yay, thanks :) |
02:41
🔗
|
dashcloud |
any idea why wpull is telling me "ImportError: No module named 'sqlalchemy" ? I used the pip install wpull method |
03:00
🔗
|
garyrh |
try pip install -U wpull |
05:11
🔗
|
amerrykan |
i'm trying to pull down every video from LA Podfest 2014 |
05:11
🔗
|
amerrykan |
i've got all but three videos, youtube-dl fails with a weird error |
05:13
🔗
|
amerrykan |
Go Bayside! - player.vimeo.com/video/107790103 |
05:13
🔗
|
amerrykan |
Road Stories - player.vimeo.com/video/107786707 |
05:13
🔗
|
amerrykan |
The JV Club - player.vimeo.com/video/107793309 |
05:13
🔗
|
amerrykan |
i'm about to head to bed, but if anyone can suggest, i'd appreciate it |
05:32
🔗
|
danneh_ |
amerrykan: I'm giving them a shot, will let you know how they go |
05:36
🔗
|
amerrykan |
if they even start for you, that's further than I've got |
05:38
🔗
|
danneh_ |
yep, first one finished and second one started, maybe try: pip install --upgrade youtube-dl |
05:39
🔗
|
danneh_ |
or it could've blocked you for doing too many at one time? though I've downloaded about 30 at once and not had issues |
05:40
🔗
|
amerrykan |
i'm on arch, so freshness shouldn't be my problem |
05:41
🔗
|
amerrykan |
i'm getting 'unable to extract info' type errors |
05:44
🔗
|
danneh_ |
fair enough, that's weird |
05:46
🔗
|
danneh_ |
I'd probably still try either that pip or youtube-dl -U , since that issue is generally related to out-of-date extractor info and shouldn't hurt anything |
05:47
🔗
|
danneh_ |
in any case, I can upload these if all else fails |
08:05
🔗
|
danneh_ |
amerrykan: downloaded both of those, let me know if you want me to upload them somewhere |
08:05
🔗
|
danneh_ |
all three of those* |
08:32
🔗
|
sharpobje |
hi, any way to find a particular twitch vod on the internet archive? |
08:32
🔗
|
sharpobje |
I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389 |
08:42
🔗
|
Atluxity |
sharpobje: lurk around, someone will be with you eventually |
10:53
🔗
|
bebzol |
hello! I'm developing seesaw and lua scripts for archiving ownlog.com service - could you create an ownlog-grab github repository for it? |
12:06
🔗
|
Muad-Dib |
arkiver, ivan` ^ |
15:18
🔗
|
stevenola |
SketchCow: Looking for your opinion on something |
15:19
🔗
|
stevenola |
I run artpacks.org, and I've had someone contact me asking for full packs that they've participated in to be removed from the archive |
15:19
🔗
|
stevenola |
Because their art is being indexed by google, and contains their real name, mailing address, phone number, exgirlfriend names, etc |
15:20
🔗
|
stevenola |
I've already added their art to robots.txt as a quick fix for this issue |
15:20
🔗
|
stevenola |
And I have no intention to remove full packs |
15:21
🔗
|
stevenola |
But I'm curious what you think about this situation |
15:22
🔗
|
DFJustin |
I'd suggest adding a robots.txt rule to whitelist ia_archiver |
15:22
🔗
|
balrog |
stevenola: how were these packs produced? |
15:22
🔗
|
DFJustin |
because what you really care about is google etc |
15:22
🔗
|
stevenola |
I've written (but not yet sent) an email describing what I did with robots.txt, and offered to censor their phone numbers and personal details. I think I'm willing to remove the specific art, but I'm still curious to hear your thoughts |
15:22
🔗
|
balrog |
were they collected from other places? |
15:22
🔗
|
balrog |
I'd censor the personal details, since that's what they're worried about |
15:23
🔗
|
stevenola |
balrog: it's old artscene artwork. It was produced by the artists, published by an "artgroup" and distributed to many sources by the group via BBS, FTP and web. |
15:23
🔗
|
raylee |
what'd i miss? |
15:23
🔗
|
raylee |
damn bnc |
15:25
🔗
|
aaaaaaaaa |
A part of me can't help but think that it is available elsewhere and they put it on the internet and they knew it was publicly posted |
15:25
🔗
|
DFJustin |
raylee: http://badcheese.com/~steve/atlogs/?chan=archiveteam |
15:25
🔗
|
balrog |
DFJustin: whitelist ia_archiver globally |
15:25
🔗
|
balrog |
aaaaaaaaa: hah, yeah |
15:26
🔗
|
stevenola |
aaaaaaaaa: yes, artpacks were basically distributed in a "hey, here's the file. pass it around!" way |
15:26
🔗
|
stevenola |
i understand the artist's concern |
15:27
🔗
|
stevenola |
just looking for other perspectives or thoughts i havent considered |
15:28
🔗
|
aaaaaaaaa |
I'd whitelist archive.org though. No use in potentially deleting it forever and it won't show up unless you specifically look for it. |
15:29
🔗
|
stevenola |
have i done it correclty? http://artpacks.org/robots.txt |
15:30
🔗
|
stevenola |
SketchCow: Since you're familiar with the artscene, your thoughts are greatly appreciated (when you get a chance) |
15:30
🔗
|
schbirid |
looks correct to me |
15:31
🔗
|
schbirid |
if you want, someone from here could initiate a full crawl for archive.org |
15:31
🔗
|
schbirid |
(while ignoring robots.txt) |
15:31
🔗
|
aaaaaaaaa |
did you delete them too, I'm getting 404s |
15:31
🔗
|
DFJustin |
stevenola: you might try emailing him instead since he seems to be afk |
15:31
🔗
|
aaaaaaaaa |
on some |
15:31
🔗
|
stevenola |
No worries about that. THe actual content is all over. I think most of the pre-2004 content is on archive.org already |
15:33
🔗
|
stevenola |
aaaaaaaaa: ah, my script generated the urls to be blocked incofrrectly :0 |
15:33
🔗
|
stevenola |
:) |
15:33
🔗
|
stevenola |
goddamn this new keyboard |
15:33
🔗
|
stevenola |
DFJustin: di you have a contact email for him? |
15:35
🔗
|
DFJustin |
jason@textfiles.com |
15:37
🔗
|
stevenola |
thank you!thank you! |
17:19
🔗
|
joepie91 |
stevenola: I don't think IA is indexed by Google |
17:19
🔗
|
joepie91 |
so if the concern is name find-ability, that shouldn't be an issue |
17:19
🔗
|
joepie91 |
err |
17:19
🔗
|
joepie91 |
IA is indexed |
17:19
🔗
|
joepie91 |
I meant I don't think the wayback is indexed by Google * |
17:21
🔗
|
SketchCow |
Boop. |
17:22
🔗
|
SketchCow |
I never respond to those. |
17:58
🔗
|
stevenola |
Ah. Maybe that would have been a good strategy |
17:58
🔗
|
stevenola |
:) |
17:59
🔗
|
stevenola |
"strategy" |
18:10
🔗
|
signius |
stevenola, Its the thing of the Internet is written in INK not Pencil |
18:12
🔗
|
stevenola |
preaching to the choir |
18:12
🔗
|
signius |
:D |
18:44
🔗
|
namespace |
So why does the warrior lose all its data on shutdown? |
18:45
🔗
|
aaaaaaaaa |
it reformats on startup |
18:45
🔗
|
namespace |
Yes but why. |
18:46
🔗
|
Jonimus |
to make sure it has a clean slate such that the next run doesn't run into issues with space or leftover data. |
18:46
🔗
|
namespace |
Oh well, I have to shut down my computer sometimes and feel incredibly guilty losing you guys 1.2 gigs of data. |
18:47
🔗
|
chronomex |
you can hit the "suspend" button in virtualbox |
18:47
🔗
|
aaaaaaaaa |
you can pause the virtual machine, it can usually start right back up where it left off. |
18:47
🔗
|
chronomex |
yeah |
18:47
🔗
|
namespace |
chronomex: Oh so that is how you're supposed to do it? |
18:47
🔗
|
namespace |
Okay. |
18:47
🔗
|
Jonimus |
If you tell the warrior to shut down using the web interface it will shutdown once the data is sent. |
18:47
🔗
|
chronomex |
^ |
18:47
🔗
|
chronomex |
but that can take a little while, depending |
18:47
🔗
|
namespace |
Well that'll take way too long. :P |
18:47
🔗
|
Jonimus |
yeah |
18:48
🔗
|
Jonimus |
that usually they do a few release claims towards the end of a project to make sure data lost in that manor is grabbed by someone else. |
18:49
🔗
|
namespace |
I mean this seems like sort of a 'gotcha' to me and I feel like there's probably some better solution. |
18:50
🔗
|
aaaaaaaaa |
Then just save the state when you close it and what to shut off the computer; unless i am missing your point. |
18:50
🔗
|
Jonimus |
there is like a vbox setting to have it suspend rather than shutdown boxes when you shutdown |
18:51
🔗
|
namespace |
aaaaaaaaa: My point is that this isn't intuitive for a first time user to know to do. |
18:52
🔗
|
Jonimus |
which is why we have the release claims methodology. |
18:52
🔗
|
namespace |
K. |
18:52
🔗
|
yipdw |
also why it's good practice for items to not be too big |
18:52
🔗
|
chronomex |
yeah |
18:52
🔗
|
chronomex |
ten minutes of downloading is a nice number |
18:53
🔗
|
namespace |
Well it doesn't help that my upload is like a soda straw compared to my download. |
18:53
🔗
|
namespace |
I'm not sure if that's on my end or Archive.org's end. |
18:53
🔗
|
Jonimus |
as is most home users. |
18:54
🔗
|
Jonimus |
and many projects are uploaded to a staging server rather than directly to Archive.org |
18:54
🔗
|
namespace |
I mean obviously the warrior is rate limiting, and it's very plausible that the staging server/etc has trouble with recieving the data as fast as it's being grabbed. |
18:55
🔗
|
chronomex |
maybe we should investigate overlapping the upload and download phases |
18:55
🔗
|
Jonimus |
that depends on the thing being grabbed, I know that was an issue for twitch as there were a large number of VPS's being used. |
18:55
🔗
|
Jonimus |
They kinda already do overlapt. |
18:55
🔗
|
chronomex |
hm, ok |
18:55
🔗
|
chronomex |
i'm not very up on things |
18:56
🔗
|
Jonimus |
It starts the next download as it uploads the previous task. |
18:56
🔗
|
chronomex |
oh, yeah, i guess it does |
18:56
🔗
|
chronomex |
my bad |
18:58
🔗
|
bebzol |
hi! anyone here can create me a github repository? |
18:59
🔗
|
bebzol |
i need ownlog-grab to start a rescue of a blogging platform |
18:59
🔗
|
bebzol |
i'm almost done with scripts and lua |
19:02
🔗
|
namespace |
bebzol: What blogging platform? |
19:02
🔗
|
yipdw |
ownlog |
19:03
🔗
|
* |
namespace still wants to hit ravearchive |
19:03
🔗
|
sharpobje |
hi, any way to find a particular twitch vod on the internet archive? |
19:03
🔗
|
sharpobje |
I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389 |
19:03
🔗
|
bebzol |
its ownlog.com - a platform for about 45 000 blogs in Poland |
19:03
🔗
|
bebzol |
it's rotting away as its owners don't seem to care |
19:03
🔗
|
namespace |
bebzol: How can I help? |
19:05
🔗
|
bebzol |
I can prepare seesaw script and lua (amost done). I've created a list of all items to download - just don't know what to do next ;). I suppose I should put this on github repository and send someone an item list (about 45 000 items - each item is a particular subdomain) |
19:06
🔗
|
garyrh |
you can also create your own repo then transfer ownership over to archiveteam |
19:08
🔗
|
bebzol |
this may be an idea |
19:09
🔗
|
bebzol |
whom should I contact to do it? |
19:10
🔗
|
garyrh |
probably yipdw or chfoo |
19:10
🔗
|
aaaaaaaaa |
I'd make the repo, test it with your own tracker and then let us know when it is done. Then the admins will take a look. |
19:11
🔗
|
bebzol |
all right |
19:15
🔗
|
aaaaaaaaa |
I think most of the admins are currently taking care of their day jobs. |
19:24
🔗
|
yipdw |
^ |
19:24
🔗
|
yipdw |
also if someone knows of a good way to trace leaks in Tomcat's fucking connection pool that would be awesome |
19:24
🔗
|
yipdw |
logAbandoned property seems to do jack shit |
19:25
🔗
|
midas |
best way to trace stuff in tomcat is to shoot it with a tank. |
19:25
🔗
|
yipdw |
not an option |
19:26
🔗
|
yipdw |
bebzol: shoot me your github username, I'll get the repo and permissions set up |
19:27
🔗
|
bebzol |
it's "basement-labs" |
19:27
🔗
|
bebzol |
thanks in advance |
19:28
🔗
|
midas |
yipdw: are you using eclipse for tracing yet? |
19:29
🔗
|
yipdw |
midas: IntelliJ, but I suspect I can do something similar |
19:29
🔗
|
yipdw |
the production application isn't configured with remote debugging etc though |
19:29
🔗
|
yipdw |
I guess I could turn that on |
19:29
🔗
|
yipdw |
anyway #-bs |
19:29
🔗
|
midas |
yeah lets move over there |
19:29
🔗
|
yipdw |
bebzol: invitation emailed, repo online |
19:29
🔗
|
bebzol |
thx :) |
20:26
🔗
|
bebzol |
does anyone knows how to debug pipeline script? I get info that wget failed - but no further info |
21:25
🔗
|
dserodio |
bebzol I've never tried, but I know Python. did you try adding a "-v" to wget_args (around line 216) ? |
21:26
🔗
|
bebzol |
unfortunately - no wget output is printed - that is the problem |
21:26
🔗
|
bebzol |
but I've already resolved my problem :) |
21:28
🔗
|
ersi_ |
It's probably because the exit/return code wasn't as the pipeline wished |
21:33
🔗
|
bebzol |
nah, I didn't set variables in python - item_type and item_value. I suppose this is important :P |