[00:11] bebzol: you could download the tracker dev env and set up a network on virtualbox for it and just change the tracker_host in your pipeline.py file [00:14] https://github.com/ArchiveTeam/archiveteam-dev-env or follow the directions here: http://archiveteam.org/index.php?title=Dev/Tracker [00:20] yay, thanks :) [02:41] any idea why wpull is telling me "ImportError: No module named 'sqlalchemy" ? I used the pip install wpull method [03:00] try pip install -U wpull [05:11] i'm trying to pull down every video from LA Podfest 2014 [05:11] i've got all but three videos, youtube-dl fails with a weird error [05:13] Go Bayside! - player.vimeo.com/video/107790103 [05:13] Road Stories - player.vimeo.com/video/107786707 [05:13] The JV Club - player.vimeo.com/video/107793309 [05:13] i'm about to head to bed, but if anyone can suggest, i'd appreciate it [05:32] amerrykan: I'm giving them a shot, will let you know how they go [05:36] if they even start for you, that's further than I've got [05:38] yep, first one finished and second one started, maybe try: pip install --upgrade youtube-dl [05:39] or it could've blocked you for doing too many at one time? though I've downloaded about 30 at once and not had issues [05:40] i'm on arch, so freshness shouldn't be my problem [05:41] i'm getting 'unable to extract info' type errors [05:44] fair enough, that's weird [05:46] I'd probably still try either that pip or youtube-dl -U , since that issue is generally related to out-of-date extractor info and shouldn't hurt anything [05:47] in any case, I can upload these if all else fails [08:05] amerrykan: downloaded both of those, let me know if you want me to upload them somewhere [08:05] all three of those* [08:32] hi, any way to find a particular twitch vod on the internet archive? [08:32] I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389 [08:42] sharpobje: lurk around, someone will be with you eventually [10:53] hello! I'm developing seesaw and lua scripts for archiving ownlog.com service - could you create an ownlog-grab github repository for it? [12:06] arkiver, ivan` ^ [15:18] SketchCow: Looking for your opinion on something [15:19] I run artpacks.org, and I've had someone contact me asking for full packs that they've participated in to be removed from the archive [15:19] Because their art is being indexed by google, and contains their real name, mailing address, phone number, exgirlfriend names, etc [15:20] I've already added their art to robots.txt as a quick fix for this issue [15:20] And I have no intention to remove full packs [15:21] But I'm curious what you think about this situation [15:22] I'd suggest adding a robots.txt rule to whitelist ia_archiver [15:22] stevenola: how were these packs produced? [15:22] because what you really care about is google etc [15:22] I've written (but not yet sent) an email describing what I did with robots.txt, and offered to censor their phone numbers and personal details. I think I'm willing to remove the specific art, but I'm still curious to hear your thoughts [15:22] were they collected from other places? [15:22] I'd censor the personal details, since that's what they're worried about [15:23] balrog: it's old artscene artwork. It was produced by the artists, published by an "artgroup" and distributed to many sources by the group via BBS, FTP and web. [15:23] what'd i miss? [15:23] damn bnc [15:25] A part of me can't help but think that it is available elsewhere and they put it on the internet and they knew it was publicly posted [15:25] raylee: http://badcheese.com/~steve/atlogs/?chan=archiveteam [15:25] DFJustin: whitelist ia_archiver globally [15:25] aaaaaaaaa: hah, yeah [15:26] aaaaaaaaa: yes, artpacks were basically distributed in a "hey, here's the file. pass it around!" way [15:26] i understand the artist's concern [15:27] just looking for other perspectives or thoughts i havent considered [15:28] I'd whitelist archive.org though. No use in potentially deleting it forever and it won't show up unless you specifically look for it. [15:29] have i done it correclty? http://artpacks.org/robots.txt [15:30] SketchCow: Since you're familiar with the artscene, your thoughts are greatly appreciated (when you get a chance) [15:30] looks correct to me [15:31] if you want, someone from here could initiate a full crawl for archive.org [15:31] (while ignoring robots.txt) [15:31] did you delete them too, I'm getting 404s [15:31] stevenola: you might try emailing him instead since he seems to be afk [15:31] on some [15:31] No worries about that. THe actual content is all over. I think most of the pre-2004 content is on archive.org already [15:33] aaaaaaaaa: ah, my script generated the urls to be blocked incofrrectly :0 [15:33] :) [15:33] goddamn this new keyboard [15:33] DFJustin: di you have a contact email for him? [15:35] jason@textfiles.com [15:37] thank you!thank you! [17:19] stevenola: I don't think IA is indexed by Google [17:19] so if the concern is name find-ability, that shouldn't be an issue [17:19] err [17:19] IA is indexed [17:19] I meant I don't think the wayback is indexed by Google * [17:21] Boop. [17:22] I never respond to those. [17:58] Ah. Maybe that would have been a good strategy [17:58] :) [17:59] "strategy" [18:10] stevenola, Its the thing of the Internet is written in INK not Pencil [18:12] preaching to the choir [18:12] :D [18:44] So why does the warrior lose all its data on shutdown? [18:45] it reformats on startup [18:45] Yes but why. [18:46] to make sure it has a clean slate such that the next run doesn't run into issues with space or leftover data. [18:46] Oh well, I have to shut down my computer sometimes and feel incredibly guilty losing you guys 1.2 gigs of data. [18:47] you can hit the "suspend" button in virtualbox [18:47] you can pause the virtual machine, it can usually start right back up where it left off. [18:47] yeah [18:47] chronomex: Oh so that is how you're supposed to do it? [18:47] Okay. [18:47] If you tell the warrior to shut down using the web interface it will shutdown once the data is sent. [18:47] ^ [18:47] but that can take a little while, depending [18:47] Well that'll take way too long. :P [18:47] yeah [18:48] that usually they do a few release claims towards the end of a project to make sure data lost in that manor is grabbed by someone else. [18:49] I mean this seems like sort of a 'gotcha' to me and I feel like there's probably some better solution. [18:50] Then just save the state when you close it and what to shut off the computer; unless i am missing your point. [18:50] there is like a vbox setting to have it suspend rather than shutdown boxes when you shutdown [18:51] aaaaaaaaa: My point is that this isn't intuitive for a first time user to know to do. [18:52] which is why we have the release claims methodology. [18:52] K. [18:52] also why it's good practice for items to not be too big [18:52] yeah [18:52] ten minutes of downloading is a nice number [18:53] Well it doesn't help that my upload is like a soda straw compared to my download. [18:53] I'm not sure if that's on my end or Archive.org's end. [18:53] as is most home users. [18:54] and many projects are uploaded to a staging server rather than directly to Archive.org [18:54] I mean obviously the warrior is rate limiting, and it's very plausible that the staging server/etc has trouble with recieving the data as fast as it's being grabbed. [18:55] maybe we should investigate overlapping the upload and download phases [18:55] that depends on the thing being grabbed, I know that was an issue for twitch as there were a large number of VPS's being used. [18:55] They kinda already do overlapt. [18:55] hm, ok [18:55] i'm not very up on things [18:56] It starts the next download as it uploads the previous task. [18:56] oh, yeah, i guess it does [18:56] my bad [18:58] hi! anyone here can create me a github repository? [18:59] i need ownlog-grab to start a rescue of a blogging platform [18:59] i'm almost done with scripts and lua [19:02] bebzol: What blogging platform? [19:02] ownlog [19:03] * namespace still wants to hit ravearchive [19:03] hi, any way to find a particular twitch vod on the internet archive? [19:03] I am looking for the (only?) vod saved for the channel leveluplive2, with highlight ID 1854389 [19:03] its ownlog.com - a platform for about 45 000 blogs in Poland [19:03] it's rotting away as its owners don't seem to care [19:03] bebzol: How can I help? [19:05] I can prepare seesaw script and lua (amost done). I've created a list of all items to download - just don't know what to do next ;). I suppose I should put this on github repository and send someone an item list (about 45 000 items - each item is a particular subdomain) [19:06] you can also create your own repo then transfer ownership over to archiveteam [19:08] this may be an idea [19:09] whom should I contact to do it? [19:10] probably yipdw or chfoo [19:10] I'd make the repo, test it with your own tracker and then let us know when it is done. Then the admins will take a look. [19:11] all right [19:15] I think most of the admins are currently taking care of their day jobs. [19:24] ^ [19:24] also if someone knows of a good way to trace leaks in Tomcat's fucking connection pool that would be awesome [19:24] logAbandoned property seems to do jack shit [19:25] best way to trace stuff in tomcat is to shoot it with a tank. [19:25] not an option [19:26] bebzol: shoot me your github username, I'll get the repo and permissions set up [19:27] it's "basement-labs" [19:27] thanks in advance [19:28] yipdw: are you using eclipse for tracing yet? [19:29] midas: IntelliJ, but I suspect I can do something similar [19:29] the production application isn't configured with remote debugging etc though [19:29] I guess I could turn that on [19:29] anyway #-bs [19:29] yeah lets move over there [19:29] bebzol: invitation emailed, repo online [19:29] thx :) [20:26] does anyone knows how to debug pipeline script? I get info that wget failed - but no further info [21:25] bebzol I've never tried, but I know Python. did you try adding a "-v" to wget_args (around line 216) ? [21:26] unfortunately - no wget output is printed - that is the problem [21:26] but I've already resolved my problem :) [21:28] It's probably because the exit/return code wasn't as the pipeline wished [21:33] nah, I didn't set variables in python - item_type and item_value. I suppose this is important :P