[00:16] *** ItsYoda has quit IRC (Ping timeout: 260 seconds) [00:25] *** j08nY has quit IRC (Read error: Connection reset by peer) [00:29] *** ItsYoda has joined #archiveteam-bs [01:14] *** fie has joined #archiveteam-bs [02:34] *** mutoso has joined #archiveteam-bs [03:09] *** Sum has joined #archiveteam-bs [03:09] can individual Twitter channels ask for archive.org exclusions? [03:13] Sum: what do you mean? [03:14] (First of all, see the topic in #archiveteam: "We are not the Internet Archive") [03:34] JesseW, I mean whether individual Twitter channel pages can ask for web archive exemptions [03:35] and yeah I know, but figured someone here would probably know [03:37] eg, if https://twitter.com/BillGates could say 'hey, I don't want archived results to appear' [03:37] figured they couldn't but wasn't sure [03:44] well, anyone can email info@archive.org and ask for whatever they feel like. And likely, if they can show that they are the author of the material in question, and ask IA to exclude it from the Wayback Machine, IA will cease distributing it through the Wayback Machine. IA has generally made it pretty clear that they don't intend to host things against the expressed desires of the material's authors or copyright holders. [03:45] Whether particular twitter channels can *implicitly* cause things to be excluded via modifying a robots.txt file -- well, each twitter channel isn't on a separate subdomain, so, they'd have to change the file at twitter.com/robots.txt -- and as far as I know, twitter doesn't currently support that. [03:46] Sum: that should probably answer your question (although I don't work for IA, I'm just some random who hangs around here) [03:47] *** Sum has quit IRC (Ping timeout: 370 seconds) [03:49] well [03:49] he probably didn't get any of that :| [03:51] now, whether IA might happen to privately hold on to bit patterns that they have decided not to distribute after receiving requests to exclude them, in the expectation that the copyright will eventually expire, and the author's eventually stop caring (i.e. by dieing) ... that's *COMPLETELY UNKNOWABLE* and we have no opinion on it. :-) [03:51] Frogging: yeah, probably not -- but it's there in the logs if they happen to care to look [04:19] *** robink has joined #archiveteam-bs [04:47] *** tomwsmf-a has quit IRC (Read error: Operation timed out) [04:48] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [04:53] Hmm...I'm trying to see look at Wayback Machine's CDX to see if there's anything missing http://www.portalgraphics.net/pg/illust/?image_id= pages, but it does not want to work with me [04:54] http://web.archive.org/cdx/search/cdx?url=www.portalgraphics.net/pg/illust/?image_id=* shows nothing at all [04:56] DoomTay: try urlencoding [04:56] Nope [04:56] *** Sk1d has joined #archiveteam-bs [04:57] This gives results: https://web.archive.org/cdx/search/cdx?url=www.portalgraphics.net/pg/illust/? [04:57] DoomTay: you have read, https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server right? [04:57] Yeah, I'm looking at it right now [04:59] matchType=prefix returns more results [05:09] Still not the kinf of thing I'm speciifcally looking for. Even http://web.archive.org/web/*/http://www.portalgraphics.net/pg/illust/?image_id=* is useless [05:14] What would a page like that be linked *from*? [05:14] You could try going there [05:15] http://www.portalgraphics.net/pg/ would be a starting point [05:18] https://web.archive.org/web/20160505195812/http://portalgraphics.net/pg/illust/?image_id=90235 [05:20] Sure, that works. But I still don't know how I'm going to count from 1 to 90322 to make sure nothing was skipped over, and especially to make sure that the latest of each isn't a "all pipes are full" page [05:20] yeah, it's odd that doesn't seem to show up in the cdx [05:25] Guess all I can do is switch gears. [05:25] Hey, arkiver, what was that URL you found that points to part a flash file used? [05:25] I remember it contained the word "attributes" [05:25] DoomTay: I think I got something that works [05:26] https://web.archive.org/cdx/search/cdx?matchType=prefix&url=portalgraphics.net/pg/illust/&output=json&limit=5&filter=original:.*image_id=.* [05:26] seems to give results [05:26] adjust the limit for more [05:31] Agh, that still just gives "download" pages and "list.php", whatever those are [05:31] Maybe I should get to finding a way to save more "download" pages anyway [05:32] That's gonna be fun... [05:42] Oh, there we go https://web.archive.org/cdx/search/cdx?matchType=prefix&url=portalgraphics.net/pg/illust/.?image_id [05:43] *** dashcloud has quit IRC (Read error: Operation timed out) [05:45] *** Sum has joined #archiveteam-bs [05:46] *** dashcloud has joined #archiveteam-bs [06:00] *** dashcloud has quit IRC (Read error: Operation timed out) [06:03] *** dashcloud has joined #archiveteam-bs [06:05] *** Spring has joined #archiveteam-bs [06:07] *** Sum has quit IRC (Read error: Operation timed out) [06:12] *** Spring has quit IRC (Read error: Operation timed out) [06:13] *** Spring has joined #archiveteam-bs [06:45] *** JesseW has quit IRC (Ping timeout: 370 seconds) [06:51] *** Spring has quit IRC (Ping timeout: 370 seconds) [06:52] *** Spring has joined #archiveteam-bs [07:02] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [07:14] *** BlueMaxim has quit IRC (Quit: Leaving) [07:18] *** DoomTay has joined #archiveteam-bs [07:19] !pending [07:22] *** Spring has quit IRC (Ping timeout: 370 seconds) [07:23] *** Spring has joined #archiveteam-bs [07:36] *** BlueMaxim has joined #archiveteam-bs [07:43] *** Spring has quit IRC (Ping timeout: 370 seconds) [07:43] *** DoomTay has quit IRC (Quit: Page closed) [07:44] *** Spring has joined #archiveteam-bs [08:04] found this, thought i'd link it for you, https://hardware.slashdot.org/story/16/07/04/0342211/the-fight-to-save-the-australian-digital-archive-trove [08:42] Can someone ban Waqar42 from the Wiki please? [09:16] *** sigkell_ has quit IRC (Read error: Connection reset by peer) [09:18] *** SilSte has quit IRC (Ping timeout: 194 seconds) [09:18] *** SilSte has joined #archiveteam-bs [09:55] so i wasn't crazying when i was grabbing pdfs from the trove: http://www.abc.net.au/news/2016-03-12/future-of-national-librarys-trove-online-database-in-doubt/7242182 [09:56] the problem now is i can't autobot it like i use too [10:02] GOOD NEWS EVERYONE [10:02] you can brute force the pages: http://trove.nla.gov.au/newspaper/rendition/nla.news-page5653.pdf [10:02] bad news is you can't just download issues [10:07] maybe we can get issues: http://trove.nla.gov.au/newspaper/rendition/nla.news-issue57/prep [10:07] * CatButts salutes channel and dissapears in a cloud of farts [10:08] *** CatButts has quit IRC (Quit: Here is my journey's end, here is my butt.) [10:09] so i figure it how [10:09] *out [10:10] export x=$(curl -s http://trove.nla.gov.au/newspaper/rendition/nla.news-issue57/prep) [10:10] wget -c http://trove.nla.gov.au/newspaper/rendition/nla.news-issue57.pdf?followup=${x} [10:16] how i grab metadata on pdf : curl -s -L http://trove.nla.gov.au/newspaper/issue/56 | grep 'data-dismiss="modal"' | grep -v aria-hidden | sed 's|, Page 1.*||g' | sed 's|.*">||g' [10:17] now am i going to be doing this grab the answer NO [10:17] but all of the code for doing a warrior project for it is there [10:29] godane: I think I follow what you're saying. Roughly how many items do you think there will be? [10:30] i have no idea on the max number [10:54] *** Spring has quit IRC (Read error: Operation timed out) [11:25] Igloo godane http://trove.nla.gov.au/system/counts [11:27] 500 million items [11:28] ok then [11:28] maybe you can get them to write them onto tape for you? [11:28] that code just focused on the newspaper issues anyways [11:28] Depends what we want to get. I fwe just want the papers then it's easy enough [11:33] i figure grabbing the newspaper first then work on the other stuff later [11:33] we also have to know how many TB the Trove is [11:34] cause i maybe too big for the Internet Archive [11:34] *it maybe [11:41] my pipeline hangs a lot at the end of jobs [11:44] !queue [11:45] * luckcolor facepalm wrong channel [11:46] does anybidy here have the latest wget-lua release link? [11:46] godane: we should do a warrior project for trove [11:49] *** PurpleSym sets mode: +o arkiver [11:51] thanks :) [11:52] I'm looking forward to the trove project [11:52] It could be massive [11:55] Haha [11:55] We've had bigger projects :) [11:55] getting all the data :P [11:56] There are archived websites on there too [11:57] it would be great if we can find a way to grab the WARCs of those archived websites [11:58] *** Spring has joined #archiveteam-bs [11:59] arkiver: i suppose they don't store those websites in that format [11:59] URLs look very similar to the wayback machine http://pandora.nla.gov.au/nph-wb/20000928130000/http://www.olympics.com/eng/index.html [12:00] wish there were a way to d/l recently made private videos on youtube [12:00] guy on YT made a bunch of his videos private following a scandal [12:01] one of the reasons I asked about the Twitter thing earlier [12:01] try google cache [12:01] he has since made his account private, which effectively removes his tweets from being crawled [12:01] Spring: who was it? [12:02] midas, for those with direct URLs and those which google has cached very recently yes [12:02] midas, this guy https://twitter.com/TmarTn [12:02] looks like it's unprivate again [12:03] still has deleted some tweets though [12:04] he came into controversy hours ago for being revealed to be running what is essentially is an underage gambling site for CSGO, promoting videos to his viewers of 'winning' thousands in mere minutes, and not once disclosing he owned the site [12:05] along with 3 other big youtubers [12:05] thats the stuff i like [13:00] *** Spring has quit IRC (Read error: Operation timed out) [13:01] *** Spring has joined #archiveteam-bs [13:22] *** BlueMaxim has quit IRC (Quit: Leaving) [14:03] *** Start has quit IRC (Quit: Disconnected.) [14:30] *** Spring has quit IRC (Read error: Operation timed out) [14:31] *** Spring has joined #archiveteam-bs [14:38] chfoo: can you please add #internetarchive, #jsmess and #newsgrabberbot to logchfoo? [14:38] I tried inviting the bot some time ago, but it does not record the channels anymore after a restart [14:45] *** dashcloud has quit IRC (Read error: Operation timed out) [14:49] *** dashcloud has joined #archiveteam-bs [15:33] *** zhongfu has quit IRC (Remote host closed the connection) [15:34] *** dude1 has joined #archiveteam-bs [15:35] *** Spring has quit IRC (Read error: Operation timed out) [15:35] Hey I figured this would be the best place to go, microsoft seems to have killed its digitalriver download links for vista and I need to get my hands on the ISO since a friend didn't have a recovery disc, does anybody know if the archive team had made backups of those ISO's they provided? [15:36] *** metalcamp has joined #archiveteam-bs [15:36] dude1, hang on [15:36] looks like i have a collection of something thats not uploaded yet [15:36] dude1, http://mirror.corenoc.de/digitalrivercontent.net/ they are all torrents [15:36] Radio Control Car Action Magazine [15:37] HCross: THANK YOU [15:37] np [15:37] Wait [15:37] Those are all win 7>.> [15:37] what OS you need? [15:38] vista [15:38] It's pro but a ultimate ISO will work since you can just make a few tweaks to ulimate to make it show you the options to choose which version you want [15:39] https://www.raymond.cc/blog/how-to-burn-downloaded-windows-vista-to-dvd/ [15:39] Thank you [15:40] And those are duds as well [15:40] :/ [15:41] *** Spring has joined #archiveteam-bs [15:41] Originals are the dead links and the alternatives aren't the actual files, boot.wim is supposed to be 2.7 gigs not 127MB [15:41] Why did they have to kill those links, they were so handy [15:42] Nevermind, got it [15:42] *** VADemon has joined #archiveteam-bs [15:42] Thanks for your help HCross [15:43] no problem [15:44] Hopefully I don't have to come back lol, you guys might want to make a archive of those files some time soon [15:44] *** dude1 has quit IRC (Quit: Page closed) [16:40] *** mls has quit IRC (Quit: leaving) [17:07] *** JesseW has joined #archiveteam-bs [17:23] *** DoomTay has joined #archiveteam-bs [17:33] *** SN4T14 has quit IRC (Quit: Leaving) [17:34] *** lytv has quit IRC (Quit: Leaving) [17:35] *** ArgyroNet has joined #archiveteam-bs [17:35] hi [17:35] so If I keep asking, if I may [17:35] hello! yes, please continue [17:35] is there a nice community for image-sharing that you'd recommend, with a good searchability ? [17:36] free, nice with property [17:36] By free I mean in the freeware meaning [17:37] Wikimedia Commons is more likely to use your images (esspecially if they are of things like a lake) -- but they may also end up deleting (i.e. stopping distributing) them if they later decide they don't meet their (changing) criteria [17:37] Uuhm, I doubt that it'd be appropriate then [17:37] So probably uploading to both archive.org and WIkimedia Commons would be better [17:37] just use flickr and enjoy it while it lasts [17:37] that too [17:39] ArgyroNet: so what lake or lakes are they pictures of? [17:39] oh, they are for many places [17:40] but atm I was thinking on this place: http://olaaa.fr/USERS/61/916d1f0d831dba097988c640169adcb1.JPG [17:40] a high touristic one, but still a nice one [17:42] neat [17:42] yes, please do upload them to archive.org [17:43] I will then ;) [17:43] I took not this many pics of the place itself, but a lot of the ground [17:43] I was fascinated by the texture of some dried algae on this red ground [17:46] algae are often fascinating [17:46] it's hard to tell by the look what it is, in this case :D [17:46] do you like nature pictures ? [17:47] generally [17:49] *** lytv has joined #archiveteam-bs [17:49] https://archive.org/search.php?query=subject%3A%22Radio+Control+Car+Action+Magazine%22 [17:51] *** Aranje has joined #archiveteam-bs [17:52] btw i'm up to 743k items [17:53] i'm starting to upload the 1970 nasa docs [17:53] there is 3192 pdfs just for that year [17:58] *** Sk1d has quit IRC (Ping timeout: 194 seconds) [17:59] looks like i maybe able to get Model Airplane News pdfs [18:06] *** Sk1d has joined #archiveteam-bs [18:07] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [18:28] I'm on it, joepie91 :) [18:28] :) [18:28] where are you from, joepie91 ? Your name sound like a nordic one ! [18:28] *sounds [18:30] Netherlands :P [18:30] Sven is a surprisingly popular name here [18:30] like, top 10 most popular for the past two decades or so [18:33] niced to meet you ! And got to go huhu [18:33] see you later/another time [18:33] and I hope Netherlands is better with freedom of expression than france is :) [18:38] *** zhongfu has joined #archiveteam-bs [18:44] *** zhongfu has quit IRC (Remote host closed the connection) [18:47] has anyone used Samsung SSDs personally? if so I'd appreciate any reports on how they hold up [18:48] for C++ / webapp development, so lots of object file and log-writing [18:50] I'm not that concerned about performance -- I'll have to hook it all up via SATA II so the interconnect is likely a bottleneck anyway; however things like sudden failures would be cool to know [18:57] yipdw: 128 GB 830 Series here with 10000 hours and 1TB(?) written. No problems so far. [18:59] *** zhongfu has joined #archiveteam-bs [19:07] I have an 850 Evo [19:07] I've had it for about a year. I don't know how many writes it's taken but it works great [19:08] 850 Evo 250GB that is [19:09] yipdw, 840 evo 500gb: 1yr 306d runtime, only listed fail is 1 CRC error count [19:09] 41.3T written [19:10] where do you get the amount written? I don't see that in smart [19:10] Total_LBAs_Written, Frogging [19:12] oh, it wasn't showing up in the Disks tool of my desktop environment [19:12] smartctl shows it [19:14] anyone have an easy way to poll total written on all connected drives? :P [19:14] 7.2TB written, 445 days powered on, no errors [19:15] that 7.2TB number is pretty surprising to me. I wouldn't have thought it'd be that high [19:16] Fletcher: script+smartctl? :p [19:17] probably the easiest way [19:18] oh it does show up in the Disks application, but value says "N/A". lol [19:18] *** ravetcofx has quit IRC (Ping timeout: 506 seconds) [19:20] *** ravetcofx has joined #archiveteam-bs [19:20] also, none of my WD hard disks have a write counter. only the SSD and Seagate HDDs [19:26] I'd take the stats for regular drives with a grain of salt though [19:26] Frogging: Some devices use a different counter shown with smartctl -x. [19:26] Fletcher: why's that? [19:26] from a casual glance at least one of my seagate drives is reporting 1T written [19:27] too low? [19:27] except this drive is a) 4+ years old and b) currently has 2tb of data on it [19:27] heh [19:30] PurpleSym: I don't see one. Maybe it's under this section which isn't supported: [19:30] Device Statistics (GP Log 0x04) not supported [19:30] Yeah, could be. [19:35] *** DoomTay has joined #archiveteam-bs [19:41] *** tomwsmf-a has joined #archiveteam-bs [19:48] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [19:51] *** metalcamp has joined #archiveteam-bs [20:16] PurpleSym: cool, good to know. I was looking at their 850 Evo [20:16] newegg reviews are very positive but there wasn't much info on "is it gonna just poof" [20:16] * yipdw is still running on two 80 GB X-25Ms [20:18] I have an 850 Evo and it's pretty great [20:24] *** JesseW has quit IRC (Ping timeout: 370 seconds) [20:31] *** ErkDog has quit IRC (Read error: Operation timed out) [20:31] *** ErkDog has joined #archiveteam-bs [20:41] *** SN4T14 has joined #archiveteam-bs [20:46] I too have an 850 evo (Actually 5) and thy're good SSD [20:48] *** dashcloud has quit IRC (Remote host closed the connection) [20:49] Which channel is thomas running in? I'm seeing some errors go past. ( arkiver )? [20:49] *** dashcloud has joined #archiveteam-bs [21:06] thomas-billid_1928-20160704-215818_data.txt 32,768 47% 0.00kB/s 0:00:00 68,639 100% 34.21MB/s 0:00:00 (xfr#1, to-chk=0/1) [21:06] rsync: chgrp "/.thomas-billid_1928-20160704-215818_data.txt.5buzBd" (in thomasdiscovery) failed: Operation not permitted (1) [21:07] Just for arkiver when he gets a moment. I think the download / upload completes successfully but it is definately erroring [21:07] I've actually never had a disk go poof [21:07] I did years ago, Not a nice feeling :( [21:08] any data loss I've experienced over the years was my fault [21:08] carelessness, etc [21:08] *** metalcamp has quit IRC (Ping timeout: 244 seconds) [21:08] accidentally turning my drive into swap space with Ubuntu's installer. hah. good times..... [21:08] I've got one failing in my PC at the moment, Bu there is no live data on it anymore [21:08] Just waiting time to power it down and actually remove it [21:08] that's... unfortunate [21:09] to be honest if I had understood how file systems work, I probably wouldn't have lost everything. I doubt it actually scrubs the data when you do that [21:09] so it was probably all there still [21:10] I think it just re-writes the partition data [21:10] Doesn't do a low level format I don't think [21:10] But reccover would be a pain in the backside [21:10] it's likely the same deal with the time I resized my Windows partition when it was hibernated, and then ran Windows. the system died slowly and all my files were, apparently, gone. but I wonder if they were actually gone [21:13] meh, I was like 12 years old. I'm more competent now :p [21:14] You live and learn [21:14] I *still* don't keep organized backups however. I need to work on that [21:15] I do now [21:15] I use crash plan + seed all the data to two servers [21:15] a lot of people learn the hard way I think. I want to avoid that [21:19] *** JesseW has joined #archiveteam-bs [21:38] *** SilSte has quit IRC (Ping timeout: 194 seconds) [21:38] *** SilSte has joined #archiveteam-bs [21:42] *** Start has joined #archiveteam-bs [22:57] *** dashcloud has quit IRC (Read error: Operation timed out) [23:00] *** dashcloud has joined #archiveteam-bs [23:15] *** ArgyroNet has quit IRC (Quit: KVIrc 4.2.0 Equilibrium http://www.kvirc.net/) [23:17] I've had a few disks just go poof [23:18] one was an OCZ Vertex3, two were Seagate Barracudas in a RAID-Z2 array [23:18] none were more than annoying, but than it's still annoying [23:46] *** BlueMaxim has joined #archiveteam-bs [23:48] *** DoomTay has quit IRC (Ping timeout: 268 seconds) [23:51] *** dashcloud has quit IRC (Read error: Operation timed out) [23:52] *** DoomTay has joined #archiveteam-bs [23:55] *** dashcloud has joined #archiveteam-bs